1 ############################################################
3 # perltidy - a perl script indenter and formatter
5 # Copyright (c) 2000-2007 by Steve Hancock
6 # Distributed under the GPL license agreement; see file COPYING
8 # This program is free software; you can redistribute it and/or modify
9 # it under the terms of the GNU General Public License as published by
10 # the Free Software Foundation; either version 2 of the License, or
11 # (at your option) any later version.
13 # This program is distributed in the hope that it will be useful,
14 # but WITHOUT ANY WARRANTY; without even the implied warranty of
15 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16 # GNU General Public License for more details.
18 # You should have received a copy of the GNU General Public License
19 # along with this program; if not, write to the Free Software
20 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
22 # For brief instructions instructions, try 'perltidy -h'.
23 # For more complete documentation, try 'man perltidy'
24 # or visit http://perltidy.sourceforge.net
26 # This script is an example of the default style. It was formatted with:
31 # Michael Cartmell supplied code for adaptation to VMS and helped with
33 # Hugh S. Myers supplied sub streamhandle and the supporting code to
34 # create a Perl::Tidy module which can operate on strings, arrays, etc.
35 # Yves Orton supplied coding to help detect Windows versions.
36 # Axel Rose supplied a patch for MacPerl.
37 # Sebastien Aperghis-Tramoni supplied a patch for the defined or operator.
38 # Dan Tyrell sent a patch for binary I/O.
39 # Many others have supplied key ideas, suggestions, and bug reports;
40 # see the CHANGES file.
42 ############################################################
45 use 5.004; # need IO::File from 5.004 or later
46 BEGIN { $^W = 1; } # turn on warnings
60 @ISA = qw( Exporter );
61 @EXPORT = qw( &perltidy );
67 ( $VERSION = q($Id: Tidy.pm,v 1.61 2007/04/24 13:31:15 perltidy Exp $) ) =~ s/^.*\s+(\d+)\/(\d+)\/(\d+).*$/$1$2$3/; # all one line for MakeMaker
72 # given filename and mode (r or w), create an object which:
73 # has a 'getline' method if mode='r', and
74 # has a 'print' method if mode='w'.
75 # The objects also need a 'close' method.
77 # How the object is made:
79 # if $filename is: Make object using:
80 # ---------------- -----------------
81 # '-' (STDIN if mode = 'r', STDOUT if mode='w')
83 # ARRAY ref Perl::Tidy::IOScalarArray (formerly IO::ScalarArray)
84 # STRING ref Perl::Tidy::IOScalar (formerly IO::Scalar)
86 # (check for 'print' method for 'w' mode)
87 # (check for 'getline' method for 'r' mode)
88 my $ref = ref( my $filename = shift );
95 if ( $ref eq 'ARRAY' ) {
96 $New = sub { Perl::Tidy::IOScalarArray->new(@_) };
98 elsif ( $ref eq 'SCALAR' ) {
99 $New = sub { Perl::Tidy::IOScalar->new(@_) };
103 # Accept an object with a getline method for reading. Note:
104 # IO::File is built-in and does not respond to the defined
105 # operator. If this causes trouble, the check can be
106 # skipped and we can just let it crash if there is no
108 if ( $mode =~ /[rR]/ ) {
109 if ( $ref eq 'IO::File' || defined &{ $ref . "::getline" } ) {
110 $New = sub { $filename };
113 $New = sub { undef };
115 ------------------------------------------------------------------------
116 No 'getline' method is defined for object of class $ref
117 Please check your call to Perl::Tidy::perltidy. Trace follows.
118 ------------------------------------------------------------------------
123 # Accept an object with a print method for writing.
124 # See note above about IO::File
125 if ( $mode =~ /[wW]/ ) {
126 if ( $ref eq 'IO::File' || defined &{ $ref . "::print" } ) {
127 $New = sub { $filename };
130 $New = sub { undef };
132 ------------------------------------------------------------------------
133 No 'print' method is defined for object of class $ref
134 Please check your call to Perl::Tidy::perltidy. Trace follows.
135 ------------------------------------------------------------------------
144 if ( $filename eq '-' ) {
145 $New = sub { $mode eq 'w' ? *STDOUT : *STDIN }
148 $New = sub { IO::File->new(@_) };
151 $fh = $New->( $filename, $mode )
152 or warn "Couldn't open file:$filename in mode:$mode : $!\n";
153 return $fh, ( $ref or $filename );
156 sub find_input_line_ending {
158 # Peek at a file and return first line ending character.
159 # Quietly return undef in case of any trouble.
160 my ($input_file) = @_;
163 # silently ignore input from object or stdin
164 if ( ref($input_file) || $input_file eq '-' ) {
167 open( INFILE, $input_file ) || return $ending;
171 read( INFILE, $buf, 1024 );
173 if ( $buf && $buf =~ /([\012\015]+)/ ) {
177 if ( $test =~ /^(\015\012)+$/ ) { $ending = "\015\012" }
180 elsif ( $test =~ /^\015+$/ ) { $ending = "\015" }
183 elsif ( $test =~ /^\012+$/ ) { $ending = "\012" }
197 # concatenate a path and file basename
198 # returns undef in case of error
200 BEGIN { eval "require File::Spec"; $missing_file_spec = $@; }
202 # use File::Spec if we can
203 unless ($missing_file_spec) {
204 return File::Spec->catfile(@_);
207 # Perl 5.004 systems may not have File::Spec so we'll make
208 # a simple try. We assume File::Basename is available.
209 # return undef if not successful.
211 my $path = join '/', @_;
212 my $test_file = $path . $name;
213 my ( $test_name, $test_path ) = fileparse($test_file);
214 return $test_file if ( $test_name eq $name );
215 return undef if ( $^O eq 'VMS' );
217 # this should work at least for Windows and Unix:
218 $test_file = $path . '/' . $name;
219 ( $test_name, $test_path ) = fileparse($test_file);
220 return $test_file if ( $test_name eq $name );
224 sub make_temporary_filename {
226 # Make a temporary filename.
228 # The POSIX tmpnam() function tends to be unreliable for non-unix
229 # systems (at least for the win32 systems that I've tested), so use
230 # a pre-defined name. A slight disadvantage of this is that two
231 # perltidy runs in the same working directory may conflict.
232 # However, the chance of that is small and managable by the user.
233 # An alternative would be to check for the file's existance and use,
234 # say .TMP0, .TMP1, etc, but that scheme has its own problems. So,
236 my $name = "perltidy.TMP";
237 if ( $^O =~ /win32|dos/i || $^O eq 'VMS' || $^O eq 'MacOs' ) {
240 eval "use POSIX qw(tmpnam)";
241 if ($@) { return $name }
244 # just make a couple of tries before giving up and using the default
246 my $tmpname = tmpnam();
247 my $fh = IO::File->new( $tmpname, O_RDWR | O_CREAT | O_EXCL );
257 # Here is a map of the flow of data from the input source to the output
260 # LineSource-->Tokenizer-->Formatter-->VerticalAligner-->FileWriter-->
261 # input groups output
262 # lines tokens lines of lines lines
265 # The names correspond to the package names responsible for the unit processes.
267 # The overall process is controlled by the "main" package.
269 # LineSource is the stream of input lines
271 # Tokenizer analyzes a line and breaks it into tokens, peeking ahead
272 # if necessary. A token is any section of the input line which should be
273 # manipulated as a single entity during formatting. For example, a single
274 # ',' character is a token, and so is an entire side comment. It handles
275 # the complexities of Perl syntax, such as distinguishing between '<<' as
276 # a shift operator and as a here-document, or distinguishing between '/'
277 # as a divide symbol and as a pattern delimiter.
279 # Formatter inserts and deletes whitespace between tokens, and breaks
280 # sequences of tokens at appropriate points as output lines. It bases its
281 # decisions on the default rules as modified by any command-line options.
283 # VerticalAligner collects groups of lines together and tries to line up
284 # certain tokens, such as '=>', '#', and '=' by adding whitespace.
286 # FileWriter simply writes lines to the output stream.
288 # The Logger package, not shown, records significant events and warning
289 # messages. It writes a .LOG file, which may be saved with a
290 # '-log' or a '-g' flag.
294 # variables needed by interrupt handler:
298 # this routine may be called to give a status report if interrupted. If a
299 # parameter is given, it will call exit with that parameter. This is no
300 # longer used because it works under Unix but not under Windows.
301 sub interrupt_handler {
303 my $exit_flag = shift;
304 print STDERR "perltidy interrupted";
306 my $input_line_number =
307 Perl::Tidy::Tokenizer::get_input_line_number();
308 print STDERR " at line $input_line_number";
312 if ( ref $input_file ) { print STDERR " of reference to:" }
313 else { print STDERR " of file:" }
314 print STDERR " $input_file";
317 exit $exit_flag if defined($exit_flag);
324 destination => undef,
331 dump_options => undef,
332 dump_options_type => undef,
333 dump_getopt_flags => undef,
334 dump_options_category => undef,
335 dump_options_range => undef,
336 dump_abbreviations => undef,
339 # don't overwrite callers ARGV
344 if ( my @bad_keys = grep { !exists $defaults{$_} } keys %input_hash ) {
346 my @good_keys = sort keys %defaults;
347 @bad_keys = sort @bad_keys;
349 ------------------------------------------------------------------------
350 Unknown perltidy parameter : (@bad_keys)
351 perltidy only understands : (@good_keys)
352 ------------------------------------------------------------------------
357 my $get_hash_ref = sub {
359 my $hash_ref = $input_hash{$key};
360 if ( defined($hash_ref) ) {
361 unless ( ref($hash_ref) eq 'HASH' ) {
362 my $what = ref($hash_ref);
364 $what ? "but is ref to $what" : "but is not a reference";
366 ------------------------------------------------------------------------
367 error in call to perltidy:
368 -$key must be reference to HASH $but_is
369 ------------------------------------------------------------------------
376 %input_hash = ( %defaults, %input_hash );
377 my $argv = $input_hash{'argv'};
378 my $destination_stream = $input_hash{'destination'};
379 my $errorfile_stream = $input_hash{'errorfile'};
380 my $logfile_stream = $input_hash{'logfile'};
381 my $perltidyrc_stream = $input_hash{'perltidyrc'};
382 my $source_stream = $input_hash{'source'};
383 my $stderr_stream = $input_hash{'stderr'};
384 my $user_formatter = $input_hash{'formatter'};
386 # various dump parameters
387 my $dump_options_type = $input_hash{'dump_options_type'};
388 my $dump_options = $get_hash_ref->('dump_options');
389 my $dump_getopt_flags = $get_hash_ref->('dump_getopt_flags');
390 my $dump_options_category = $get_hash_ref->('dump_options_category');
391 my $dump_abbreviations = $get_hash_ref->('dump_abbreviations');
392 my $dump_options_range = $get_hash_ref->('dump_options_range');
394 # validate dump_options_type
395 if ( defined($dump_options) ) {
396 unless ( defined($dump_options_type) ) {
397 $dump_options_type = 'perltidyrc';
399 unless ( $dump_options_type =~ /^(perltidyrc|full)$/ ) {
401 ------------------------------------------------------------------------
402 Please check value of -dump_options_type in call to perltidy;
403 saw: '$dump_options_type'
404 expecting: 'perltidyrc' or 'full'
405 ------------------------------------------------------------------------
411 $dump_options_type = "";
414 if ($user_formatter) {
416 # if the user defines a formatter, there is no output stream,
417 # but we need a null stream to keep coding simple
418 $destination_stream = Perl::Tidy::DevNull->new();
421 # see if ARGV is overridden
422 if ( defined($argv) ) {
424 my $rargv = ref $argv;
425 if ( $rargv eq 'SCALAR' ) { $argv = $$argv; $rargv = undef }
429 if ( $rargv eq 'ARRAY' ) {
434 ------------------------------------------------------------------------
435 Please check value of -argv in call to perltidy;
436 it must be a string or ref to ARRAY but is: $rargv
437 ------------------------------------------------------------------------
444 my ( $rargv, $msg ) = parse_args($argv);
447 Error parsing this string passed to to perltidy with 'argv':
455 # redirect STDERR if requested
456 if ($stderr_stream) {
457 my ( $fh_stderr, $stderr_file ) =
458 Perl::Tidy::streamhandle( $stderr_stream, 'w' );
459 if ($fh_stderr) { *STDERR = $fh_stderr }
462 ------------------------------------------------------------------------
463 Unable to redirect STDERR to $stderr_stream
464 Please check value of -stderr in call to perltidy
465 ------------------------------------------------------------------------
470 my $rpending_complaint;
471 $$rpending_complaint = "";
472 my $rpending_logfile_message;
473 $$rpending_logfile_message = "";
475 my ( $is_Windows, $Windows_type ) =
476 look_for_Windows($rpending_complaint);
478 # VMS file names are restricted to a 40.40 format, so we append _tdy
479 # instead of .tdy, etc. (but see also sub check_vms_filename)
482 if ( $^O eq 'VMS' ) {
488 $dot_pattern = '\.'; # must escape for use in regex
491 # handle command line options
492 my ( $rOpts, $config_file, $rraw_options, $saw_extrude, $roption_string,
493 $rexpansion, $roption_category, $roption_range )
494 = process_command_line(
495 $perltidyrc_stream, $is_Windows, $Windows_type,
496 $rpending_complaint, $dump_options_type,
499 # return or exit immediately after all dumps
502 # Getopt parameters and their flags
503 if ( defined($dump_getopt_flags) ) {
505 foreach my $op ( @{$roption_string} ) {
514 if ( $opt =~ /(.*)(!|=.*|:.*)$/ ) {
518 $dump_getopt_flags->{$opt} = $flag;
522 if ( defined($dump_options_category) ) {
524 %{$dump_options_category} = %{$roption_category};
527 if ( defined($dump_options_range) ) {
529 %{$dump_options_range} = %{$roption_range};
532 if ( defined($dump_abbreviations) ) {
534 %{$dump_abbreviations} = %{$rexpansion};
537 if ( defined($dump_options) ) {
539 %{$dump_options} = %{$rOpts};
542 return if ($quit_now);
544 # dump from command line
545 if ( $rOpts->{'dump-options'} ) {
546 dump_options( $rOpts, $roption_string );
550 check_options( $rOpts, $is_Windows, $Windows_type,
551 $rpending_complaint );
553 if ($user_formatter) {
554 $rOpts->{'format'} = 'user';
557 # there must be one entry here for every possible format
558 my %default_file_extension = (
564 # be sure we have a valid output format
565 unless ( exists $default_file_extension{ $rOpts->{'format'} } ) {
566 my $formats = join ' ',
567 sort map { "'" . $_ . "'" } keys %default_file_extension;
568 my $fmt = $rOpts->{'format'};
569 die "-format='$fmt' but must be one of: $formats\n";
572 my $output_extension =
573 make_extension( $rOpts->{'output-file-extension'},
574 $default_file_extension{ $rOpts->{'format'} }, $dot );
576 my $backup_extension =
577 make_extension( $rOpts->{'backup-file-extension'}, 'bak', $dot );
579 my $html_toc_extension =
580 make_extension( $rOpts->{'html-toc-extension'}, 'toc', $dot );
582 my $html_src_extension =
583 make_extension( $rOpts->{'html-src-extension'}, 'src', $dot );
585 # check for -b option;
586 my $in_place_modify = $rOpts->{'backup-and-modify-in-place'}
587 && $rOpts->{'format'} eq 'tidy' # silently ignore unless beautify mode
588 && @ARGV > 0; # silently ignore if standard input;
589 # this allows -b to be in a .perltidyrc file
590 # without error messages when running from an editor
592 # turn off -b with warnings in case of conflicts with other options
593 if ($in_place_modify) {
594 if ( $rOpts->{'standard-output'} ) {
595 warn "Ignoring -b; you may not use -b and -st together\n";
596 $in_place_modify = 0;
598 if ($destination_stream) {
600 "Ignoring -b; you may not specify a destination array and -b together\n";
601 $in_place_modify = 0;
603 if ($source_stream) {
605 "Ignoring -b; you may not specify a source array and -b together\n";
606 $in_place_modify = 0;
608 if ( $rOpts->{'outfile'} ) {
609 warn "Ignoring -b; you may not use -b and -o together\n";
610 $in_place_modify = 0;
612 if ( defined( $rOpts->{'output-path'} ) ) {
613 warn "Ignoring -b; you may not use -b and -opath together\n";
614 $in_place_modify = 0;
618 Perl::Tidy::Formatter::check_options($rOpts);
619 if ( $rOpts->{'format'} eq 'html' ) {
620 Perl::Tidy::HtmlWriter->check_options($rOpts);
623 # make the pattern of file extensions that we shouldn't touch
624 my $forbidden_file_extensions = "(($dot_pattern)(LOG|DEBUG|ERR|TEE)";
625 if ($output_extension) {
626 my $ext = quotemeta($output_extension);
627 $forbidden_file_extensions .= "|$ext";
629 if ( $in_place_modify && $backup_extension ) {
630 my $ext = quotemeta($backup_extension);
631 $forbidden_file_extensions .= "|$ext";
633 $forbidden_file_extensions .= ')$';
635 # Create a diagnostics object if requested;
636 # This is only useful for code development
637 my $diagnostics_object = undef;
638 if ( $rOpts->{'DIAGNOSTICS'} ) {
639 $diagnostics_object = Perl::Tidy::Diagnostics->new();
642 # no filenames should be given if input is from an array
643 if ($source_stream) {
646 "You may not specify any filenames when a source array is given\n";
649 # we'll stuff the source array into ARGV
650 unshift( @ARGV, $source_stream );
652 # No special treatment for source stream which is a filename.
653 # This will enable checks for binary files and other bad stuff.
654 $source_stream = undef unless ref($source_stream);
657 # use stdin by default if no source array and no args
659 unshift( @ARGV, '-' ) unless @ARGV;
662 # loop to process all files in argument list
663 my $number_of_files = @ARGV;
664 my $formatter = undef;
666 while ( $input_file = shift @ARGV ) {
668 my $input_file_permissions;
670 #---------------------------------------------------------------
671 # determine the input file name
672 #---------------------------------------------------------------
673 if ($source_stream) {
674 $fileroot = "perltidy";
676 elsif ( $input_file eq '-' ) { # '-' indicates input from STDIN
677 $fileroot = "perltidy"; # root name to use for .ERR, .LOG, etc
678 $in_place_modify = 0;
681 $fileroot = $input_file;
682 unless ( -e $input_file ) {
684 # file doesn't exist - check for a file glob
685 if ( $input_file =~ /([\?\*\[\{])/ ) {
687 # Windows shell may not remove quotes, so do it
688 my $input_file = $input_file;
689 if ( $input_file =~ /^\'(.+)\'$/ ) { $input_file = $1 }
690 if ( $input_file =~ /^\"(.+)\"$/ ) { $input_file = $1 }
691 my $pattern = fileglob_to_re($input_file);
693 if ( !$@ && opendir( DIR, './' ) ) {
695 grep { /$pattern/ && !-d $_ } readdir(DIR);
698 unshift @ARGV, @files;
703 print "skipping file: '$input_file': no matches found\n";
707 unless ( -f $input_file ) {
708 print "skipping file: $input_file: not a regular file\n";
712 unless ( ( -T $input_file ) || $rOpts->{'force-read-binary'} ) {
714 "skipping file: $input_file: Non-text (override with -f)\n";
718 # we should have a valid filename now
719 $fileroot = $input_file;
720 $input_file_permissions = ( stat $input_file )[2] & 07777;
722 if ( $^O eq 'VMS' ) {
723 ( $fileroot, $dot ) = check_vms_filename($fileroot);
726 # add option to change path here
727 if ( defined( $rOpts->{'output-path'} ) ) {
729 my ( $base, $old_path ) = fileparse($fileroot);
730 my $new_path = $rOpts->{'output-path'};
731 unless ( -d $new_path ) {
732 unless ( mkdir $new_path, 0777 ) {
733 die "unable to create directory $new_path: $!\n";
736 my $path = $new_path;
737 $fileroot = catfile( $path, $base );
740 ------------------------------------------------------------------------
741 Problem combining $new_path and $base to make a filename; check -opath
742 ------------------------------------------------------------------------
748 # Skip files with same extension as the output files because
749 # this can lead to a messy situation with files like
750 # script.tdy.tdy.tdy ... or worse problems ... when you
751 # rerun perltidy over and over with wildcard input.
754 && ( $input_file =~ /$forbidden_file_extensions/o
755 || $input_file eq 'DIAGNOSTICS' )
758 print "skipping file: $input_file: wrong extension\n";
762 # the 'source_object' supplies a method to read the input file
764 Perl::Tidy::LineSource->new( $input_file, $rOpts,
765 $rpending_logfile_message );
766 next unless ($source_object);
768 # register this file name with the Diagnostics package
769 $diagnostics_object->set_input_file($input_file)
770 if $diagnostics_object;
772 #---------------------------------------------------------------
773 # determine the output file name
774 #---------------------------------------------------------------
775 my $output_file = undef;
776 my $actual_output_extension;
778 if ( $rOpts->{'outfile'} ) {
780 if ( $number_of_files <= 1 ) {
782 if ( $rOpts->{'standard-output'} ) {
783 die "You may not use -o and -st together\n";
785 elsif ($destination_stream) {
787 "You may not specify a destination array and -o together\n";
789 elsif ( defined( $rOpts->{'output-path'} ) ) {
790 die "You may not specify -o and -opath together\n";
792 elsif ( defined( $rOpts->{'output-file-extension'} ) ) {
793 die "You may not specify -o and -oext together\n";
795 $output_file = $rOpts->{outfile};
797 # make sure user gives a file name after -o
798 if ( $output_file =~ /^-/ ) {
799 die "You must specify a valid filename after -o\n";
802 # do not overwrite input file with -o
803 if ( defined($input_file_permissions)
804 && ( $output_file eq $input_file ) )
807 "Use 'perltidy -b $input_file' to modify in-place\n";
811 die "You may not use -o with more than one input file\n";
814 elsif ( $rOpts->{'standard-output'} ) {
815 if ($destination_stream) {
817 "You may not specify a destination array and -st together\n";
821 if ( $number_of_files <= 1 ) {
824 die "You may not use -st with more than one input file\n";
827 elsif ($destination_stream) {
828 $output_file = $destination_stream;
830 elsif ($source_stream) { # source but no destination goes to stdout
833 elsif ( $input_file eq '-' ) {
837 if ($in_place_modify) {
838 $output_file = IO::File->new_tmpfile()
839 or die "cannot open temp file for -b option: $!\n";
842 $actual_output_extension = $output_extension;
843 $output_file = $fileroot . $output_extension;
847 # the 'sink_object' knows how to write the output file
848 my $tee_file = $fileroot . $dot . "TEE";
850 my $line_separator = $rOpts->{'output-line-ending'};
851 if ( $rOpts->{'preserve-line-endings'} ) {
852 $line_separator = find_input_line_ending($input_file);
855 # Eventually all I/O may be done with binmode, but for now it is
856 # only done when a user requests a particular line separator
857 # through the -ple or -ole flags
859 if ( defined($line_separator) ) { $binmode = 1 }
860 else { $line_separator = "\n" }
863 Perl::Tidy::LineSink->new( $output_file, $tee_file,
864 $line_separator, $rOpts, $rpending_logfile_message, $binmode );
866 #---------------------------------------------------------------
867 # initialize the error logger
868 #---------------------------------------------------------------
869 my $warning_file = $fileroot . $dot . "ERR";
870 if ($errorfile_stream) { $warning_file = $errorfile_stream }
871 my $log_file = $fileroot . $dot . "LOG";
872 if ($logfile_stream) { $log_file = $logfile_stream }
875 Perl::Tidy::Logger->new( $rOpts, $log_file, $warning_file,
877 write_logfile_header(
878 $rOpts, $logger_object, $config_file,
879 $rraw_options, $Windows_type
881 if ($$rpending_logfile_message) {
882 $logger_object->write_logfile_entry($$rpending_logfile_message);
884 if ($$rpending_complaint) {
885 $logger_object->complain($$rpending_complaint);
888 #---------------------------------------------------------------
889 # initialize the debug object, if any
890 #---------------------------------------------------------------
891 my $debugger_object = undef;
892 if ( $rOpts->{DEBUG} ) {
894 Perl::Tidy::Debugger->new( $fileroot . $dot . "DEBUG" );
897 #---------------------------------------------------------------
898 # create a formatter for this file : html writer or pretty printer
899 #---------------------------------------------------------------
901 # we have to delete any old formatter because, for safety,
902 # the formatter will check to see that there is only one.
905 if ($user_formatter) {
906 $formatter = $user_formatter;
908 elsif ( $rOpts->{'format'} eq 'html' ) {
910 Perl::Tidy::HtmlWriter->new( $fileroot, $output_file,
911 $actual_output_extension, $html_toc_extension,
912 $html_src_extension );
914 elsif ( $rOpts->{'format'} eq 'tidy' ) {
915 $formatter = Perl::Tidy::Formatter->new(
916 logger_object => $logger_object,
917 diagnostics_object => $diagnostics_object,
918 sink_object => $sink_object,
922 die "I don't know how to do -format=$rOpts->{'format'}\n";
925 unless ($formatter) {
926 die "Unable to continue with $rOpts->{'format'} formatting\n";
929 #---------------------------------------------------------------
930 # create the tokenizer for this file
931 #---------------------------------------------------------------
932 $tokenizer = undef; # must destroy old tokenizer
933 $tokenizer = Perl::Tidy::Tokenizer->new(
934 source_object => $source_object,
935 logger_object => $logger_object,
936 debugger_object => $debugger_object,
937 diagnostics_object => $diagnostics_object,
938 starting_level => $rOpts->{'starting-indentation-level'},
939 tabs => $rOpts->{'tabs'},
940 indent_columns => $rOpts->{'indent-columns'},
941 look_for_hash_bang => $rOpts->{'look-for-hash-bang'},
942 look_for_autoloader => $rOpts->{'look-for-autoloader'},
943 look_for_selfloader => $rOpts->{'look-for-selfloader'},
944 trim_qw => $rOpts->{'trim-qw'},
947 #---------------------------------------------------------------
949 #---------------------------------------------------------------
950 process_this_file( $tokenizer, $formatter );
952 #---------------------------------------------------------------
953 # close the input source and report errors
954 #---------------------------------------------------------------
955 $source_object->close_input_file();
957 # get file names to use for syntax check
958 my $ifname = $source_object->get_input_file_copy_name();
959 my $ofname = $sink_object->get_output_file_copy();
961 #---------------------------------------------------------------
962 # handle the -b option (backup and modify in-place)
963 #---------------------------------------------------------------
964 if ($in_place_modify) {
965 unless ( -f $input_file ) {
967 # oh, oh, no real file to backup ..
968 # shouldn't happen because of numerous preliminary checks
970 "problem with -b backing up input file '$input_file': not a file\n";
972 my $backup_name = $input_file . $backup_extension;
973 if ( -f $backup_name ) {
976 "unable to remove previous '$backup_name' for -b option; check permissions: $!\n";
978 rename( $input_file, $backup_name )
980 "problem renaming $input_file to $backup_name for -b option: $!\n";
981 $ifname = $backup_name;
983 seek( $output_file, 0, 0 )
984 or die "unable to rewind tmp file for -b option: $!\n";
986 my $fout = IO::File->new("> $input_file")
988 "problem opening $input_file for write for -b option; check directory permissions: $!\n";
991 while ( $line = $output_file->getline() ) {
995 $output_file = $input_file;
996 $ofname = $input_file;
999 #---------------------------------------------------------------
1000 # clean up and report errors
1001 #---------------------------------------------------------------
1002 $sink_object->close_output_file() if $sink_object;
1003 $debugger_object->close_debug_file() if $debugger_object;
1005 my $infile_syntax_ok = 0; # -1 no 0=don't know 1 yes
1008 if ($input_file_permissions) {
1010 # give output script same permissions as input script, but
1011 # make it user-writable or else we can't run perltidy again.
1012 # Thus we retain whatever executable flags were set.
1013 if ( $rOpts->{'format'} eq 'tidy' ) {
1014 chmod( $input_file_permissions | 0600, $output_file );
1017 # else use default permissions for html and any other format
1020 if ( $logger_object && $rOpts->{'check-syntax'} ) {
1022 check_syntax( $ifname, $ofname, $logger_object, $rOpts );
1026 $logger_object->finish( $infile_syntax_ok, $formatter )
1028 } # end of loop to process all files
1029 } # end of main program
1032 sub fileglob_to_re {
1034 # modified (corrected) from version in find2perl
1036 $x =~ s#([./^\$()])#\\$1#g; # escape special characters
1037 $x =~ s#\*#.*#g; # '*' -> '.*'
1038 $x =~ s#\?#.#g; # '?' -> '.'
1039 "^$x\\z"; # match whole word
1042 sub make_extension {
1044 # Make a file extension, including any leading '.' if necessary
1045 # The '.' may actually be an '_' under VMS
1046 my ( $extension, $default, $dot ) = @_;
1048 # Use the default if none specified
1049 $extension = $default unless ($extension);
1051 # Only extensions with these leading characters get a '.'
1052 # This rule gives the user some freedom
1053 if ( $extension =~ /^[a-zA-Z0-9]/ ) {
1054 $extension = $dot . $extension;
1059 sub write_logfile_header {
1060 my ( $rOpts, $logger_object, $config_file, $rraw_options, $Windows_type ) =
1062 $logger_object->write_logfile_entry(
1063 "perltidy version $VERSION log file on a $^O system, OLD_PERL_VERSION=$]\n"
1065 if ($Windows_type) {
1066 $logger_object->write_logfile_entry("Windows type is $Windows_type\n");
1068 my $options_string = join( ' ', @$rraw_options );
1071 $logger_object->write_logfile_entry(
1072 "Found Configuration File >>> $config_file \n");
1074 $logger_object->write_logfile_entry(
1075 "Configuration and command line parameters for this run:\n");
1076 $logger_object->write_logfile_entry("$options_string\n");
1078 if ( $rOpts->{'DEBUG'} || $rOpts->{'show-options'} ) {
1079 $rOpts->{'logfile'} = 1; # force logfile to be saved
1080 $logger_object->write_logfile_entry(
1081 "Final parameter set for this run\n");
1082 $logger_object->write_logfile_entry(
1083 "------------------------------------\n");
1085 foreach ( keys %{$rOpts} ) {
1086 $logger_object->write_logfile_entry( '--' . "$_=$rOpts->{$_}\n" );
1088 $logger_object->write_logfile_entry(
1089 "------------------------------------\n");
1091 $logger_object->write_logfile_entry(
1092 "To find error messages search for 'WARNING' with your editor\n");
1095 sub generate_options {
1097 ######################################################################
1098 # Generate and return references to:
1099 # @option_string - the list of options to be passed to Getopt::Long
1100 # @defaults - the list of default options
1101 # %expansion - a hash showing how all abbreviations are expanded
1102 # %category - a hash giving the general category of each option
1103 # %option_range - a hash giving the valid ranges of certain options
1105 # Note: a few options are not documented in the man page and usage
1106 # message. This is because these are experimental or debug options and
1107 # may or may not be retained in future versions.
1109 # Here are the undocumented flags as far as I know. Any of them
1110 # may disappear at any time. They are mainly for fine-tuning
1113 # fll --> fuzzy-line-length # a trivial parameter which gets
1114 # turned off for the extrude option
1115 # which is mainly for debugging
1116 # chk --> check-multiline-quotes # check for old bug; to be deleted
1117 # scl --> short-concatenation-item-length # helps break at '.'
1118 # recombine # for debugging line breaks
1119 # valign # for debugging vertical alignment
1120 # I --> DIAGNOSTICS # for debugging
1121 ######################################################################
1123 # here is a summary of the Getopt codes:
1124 # <none> does not take an argument
1125 # =s takes a mandatory string
1126 # :s takes an optional string (DO NOT USE - filenames will get eaten up)
1127 # =i takes a mandatory integer
1128 # :i takes an optional integer (NOT RECOMMENDED - can cause trouble)
1129 # ! does not take an argument and may be negated
1130 # i.e., -foo and -nofoo are allowed
1131 # a double dash signals the end of the options list
1133 #---------------------------------------------------------------
1134 # Define the option string passed to GetOptions.
1135 #---------------------------------------------------------------
1137 my @option_string = ();
1139 my %option_category = ();
1140 my %option_range = ();
1141 my $rexpansion = \%expansion;
1143 # names of categories in manual
1144 # leading integers will allow sorting
1145 my @category_name = (
1147 '1. Basic formatting options',
1148 '2. Code indentation control',
1149 '3. Whitespace control',
1150 '4. Comment controls',
1151 '5. Linebreak controls',
1152 '6. Controlling list formatting',
1153 '7. Retaining or ignoring existing line breaks',
1154 '8. Blank line control',
1155 '9. Other controls',
1157 '11. pod2html options',
1158 '12. Controlling HTML properties',
1162 # These options are parsed directly by perltidy:
1165 # However, they are included in the option set so that they will
1166 # be seen in the options dump.
1168 # These long option names have no abbreviations or are treated specially
1169 @option_string = qw(
1178 my $category = 13; # Debugging
1179 foreach (@option_string) {
1180 my $opt = $_; # must avoid changing the actual flag
1182 $option_category{$opt} = $category_name[$category];
1185 $category = 11; # HTML
1186 $option_category{html} = $category_name[$category];
1188 # routine to install and check options
1189 my $add_option = sub {
1190 my ( $long_name, $short_name, $flag ) = @_;
1191 push @option_string, $long_name . $flag;
1192 $option_category{$long_name} = $category_name[$category];
1194 if ( $expansion{$short_name} ) {
1195 my $existing_name = $expansion{$short_name}[0];
1197 "redefining abbreviation $short_name for $long_name; already used for $existing_name\n";
1199 $expansion{$short_name} = [$long_name];
1200 if ( $flag eq '!' ) {
1201 my $nshort_name = 'n' . $short_name;
1202 my $nolong_name = 'no' . $long_name;
1203 if ( $expansion{$nshort_name} ) {
1204 my $existing_name = $expansion{$nshort_name}[0];
1206 "attempting to redefine abbreviation $nshort_name for $nolong_name; already used for $existing_name\n";
1208 $expansion{$nshort_name} = [$nolong_name];
1213 # Install long option names which have a simple abbreviation.
1214 # Options with code '!' get standard negation ('no' for long names,
1215 # 'n' for abbreviations). Categories follow the manual.
1217 ###########################
1218 $category = 0; # I/O_Control
1219 ###########################
1220 $add_option->( 'backup-and-modify-in-place', 'b', '!' );
1221 $add_option->( 'backup-file-extension', 'bext', '=s' );
1222 $add_option->( 'force-read-binary', 'f', '!' );
1223 $add_option->( 'format', 'fmt', '=s' );
1224 $add_option->( 'logfile', 'log', '!' );
1225 $add_option->( 'logfile-gap', 'g', ':i' );
1226 $add_option->( 'outfile', 'o', '=s' );
1227 $add_option->( 'output-file-extension', 'oext', '=s' );
1228 $add_option->( 'output-path', 'opath', '=s' );
1229 $add_option->( 'profile', 'pro', '=s' );
1230 $add_option->( 'quiet', 'q', '!' );
1231 $add_option->( 'standard-error-output', 'se', '!' );
1232 $add_option->( 'standard-output', 'st', '!' );
1233 $add_option->( 'warning-output', 'w', '!' );
1235 # options which are both toggle switches and values moved here
1236 # to hide from tidyview (which does not show category 0 flags):
1237 # -ole moved here from category 1
1238 # -sil moved here from category 2
1239 $add_option->( 'output-line-ending', 'ole', '=s' );
1240 $add_option->( 'starting-indentation-level', 'sil', '=i' );
1242 ########################################
1243 $category = 1; # Basic formatting options
1244 ########################################
1245 $add_option->( 'check-syntax', 'syn', '!' );
1246 $add_option->( 'entab-leading-whitespace', 'et', '=i' );
1247 $add_option->( 'indent-columns', 'i', '=i' );
1248 $add_option->( 'maximum-line-length', 'l', '=i' );
1249 $add_option->( 'perl-syntax-check-flags', 'pscf', '=s' );
1250 $add_option->( 'preserve-line-endings', 'ple', '!' );
1251 $add_option->( 'tabs', 't', '!' );
1253 ########################################
1254 $category = 2; # Code indentation control
1255 ########################################
1256 $add_option->( 'continuation-indentation', 'ci', '=i' );
1257 $add_option->( 'line-up-parentheses', 'lp', '!' );
1258 $add_option->( 'outdent-keyword-list', 'okwl', '=s' );
1259 $add_option->( 'outdent-keywords', 'okw', '!' );
1260 $add_option->( 'outdent-labels', 'ola', '!' );
1261 $add_option->( 'outdent-long-quotes', 'olq', '!' );
1262 $add_option->( 'indent-closing-brace', 'icb', '!' );
1263 $add_option->( 'closing-token-indentation', 'cti', '=i' );
1264 $add_option->( 'closing-paren-indentation', 'cpi', '=i' );
1265 $add_option->( 'closing-brace-indentation', 'cbi', '=i' );
1266 $add_option->( 'closing-square-bracket-indentation', 'csbi', '=i' );
1267 $add_option->( 'brace-left-and-indent', 'bli', '!' );
1268 $add_option->( 'brace-left-and-indent-list', 'blil', '=s' );
1270 ########################################
1271 $category = 3; # Whitespace control
1272 ########################################
1273 $add_option->( 'add-semicolons', 'asc', '!' );
1274 $add_option->( 'add-whitespace', 'aws', '!' );
1275 $add_option->( 'block-brace-tightness', 'bbt', '=i' );
1276 $add_option->( 'brace-tightness', 'bt', '=i' );
1277 $add_option->( 'delete-old-whitespace', 'dws', '!' );
1278 $add_option->( 'delete-semicolons', 'dsm', '!' );
1279 $add_option->( 'nospace-after-keyword', 'nsak', '=s' );
1280 $add_option->( 'nowant-left-space', 'nwls', '=s' );
1281 $add_option->( 'nowant-right-space', 'nwrs', '=s' );
1282 $add_option->( 'paren-tightness', 'pt', '=i' );
1283 $add_option->( 'space-after-keyword', 'sak', '=s' );
1284 $add_option->( 'space-for-semicolon', 'sfs', '!' );
1285 $add_option->( 'space-function-paren', 'sfp', '!' );
1286 $add_option->( 'space-keyword-paren', 'skp', '!' );
1287 $add_option->( 'space-terminal-semicolon', 'sts', '!' );
1288 $add_option->( 'square-bracket-tightness', 'sbt', '=i' );
1289 $add_option->( 'square-bracket-vertical-tightness', 'sbvt', '=i' );
1290 $add_option->( 'square-bracket-vertical-tightness-closing', 'sbvtc', '=i' );
1291 $add_option->( 'trim-qw', 'tqw', '!' );
1292 $add_option->( 'want-left-space', 'wls', '=s' );
1293 $add_option->( 'want-right-space', 'wrs', '=s' );
1295 ########################################
1296 $category = 4; # Comment controls
1297 ########################################
1298 $add_option->( 'closing-side-comment-else-flag', 'csce', '=i' );
1299 $add_option->( 'closing-side-comment-interval', 'csci', '=i' );
1300 $add_option->( 'closing-side-comment-list', 'cscl', '=s' );
1301 $add_option->( 'closing-side-comment-maximum-text', 'csct', '=i' );
1302 $add_option->( 'closing-side-comment-prefix', 'cscp', '=s' );
1303 $add_option->( 'closing-side-comment-warnings', 'cscw', '!' );
1304 $add_option->( 'closing-side-comments', 'csc', '!' );
1305 $add_option->( 'format-skipping', 'fs', '!' );
1306 $add_option->( 'format-skipping-begin', 'fsb', '=s' );
1307 $add_option->( 'format-skipping-end', 'fse', '=s' );
1308 $add_option->( 'hanging-side-comments', 'hsc', '!' );
1309 $add_option->( 'indent-block-comments', 'ibc', '!' );
1310 $add_option->( 'indent-spaced-block-comments', 'isbc', '!' );
1311 $add_option->( 'minimum-space-to-comment', 'msc', '=i' );
1312 $add_option->( 'outdent-long-comments', 'olc', '!' );
1313 $add_option->( 'outdent-static-block-comments', 'osbc', '!' );
1314 $add_option->( 'static-block-comment-prefix', 'sbcp', '=s' );
1315 $add_option->( 'static-block-comments', 'sbc', '!' );
1316 $add_option->( 'static-side-comment-prefix', 'sscp', '=s' );
1317 $add_option->( 'static-side-comments', 'ssc', '!' );
1319 ########################################
1320 $category = 5; # Linebreak controls
1321 ########################################
1322 $add_option->( 'add-newlines', 'anl', '!' );
1323 $add_option->( 'block-brace-vertical-tightness', 'bbvt', '=i' );
1324 $add_option->( 'block-brace-vertical-tightness-list', 'bbvtl', '=s' );
1325 $add_option->( 'brace-vertical-tightness', 'bvt', '=i' );
1326 $add_option->( 'brace-vertical-tightness-closing', 'bvtc', '=i' );
1327 $add_option->( 'cuddled-else', 'ce', '!' );
1328 $add_option->( 'delete-old-newlines', 'dnl', '!' );
1329 $add_option->( 'opening-brace-always-on-right', 'bar', '!' );
1330 $add_option->( 'opening-brace-on-new-line', 'bl', '!' );
1331 $add_option->( 'opening-hash-brace-right', 'ohbr', '!' );
1332 $add_option->( 'opening-paren-right', 'opr', '!' );
1333 $add_option->( 'opening-square-bracket-right', 'osbr', '!' );
1334 $add_option->( 'opening-sub-brace-on-new-line', 'sbl', '!' );
1335 $add_option->( 'paren-vertical-tightness', 'pvt', '=i' );
1336 $add_option->( 'paren-vertical-tightness-closing', 'pvtc', '=i' );
1337 $add_option->( 'stack-closing-hash-brace', 'schb', '!' );
1338 $add_option->( 'stack-closing-paren', 'scp', '!' );
1339 $add_option->( 'stack-closing-square-bracket', 'scsb', '!' );
1340 $add_option->( 'stack-opening-hash-brace', 'sohb', '!' );
1341 $add_option->( 'stack-opening-paren', 'sop', '!' );
1342 $add_option->( 'stack-opening-square-bracket', 'sosb', '!' );
1343 $add_option->( 'vertical-tightness', 'vt', '=i' );
1344 $add_option->( 'vertical-tightness-closing', 'vtc', '=i' );
1345 $add_option->( 'want-break-after', 'wba', '=s' );
1346 $add_option->( 'want-break-before', 'wbb', '=s' );
1348 ########################################
1349 $category = 6; # Controlling list formatting
1350 ########################################
1351 $add_option->( 'break-at-old-comma-breakpoints', 'boc', '!' );
1352 $add_option->( 'comma-arrow-breakpoints', 'cab', '=i' );
1353 $add_option->( 'maximum-fields-per-table', 'mft', '=i' );
1355 ########################################
1356 $category = 7; # Retaining or ignoring existing line breaks
1357 ########################################
1358 $add_option->( 'break-at-old-keyword-breakpoints', 'bok', '!' );
1359 $add_option->( 'break-at-old-logical-breakpoints', 'bol', '!' );
1360 $add_option->( 'break-at-old-ternary-breakpoints', 'bot', '!' );
1361 $add_option->( 'ignore-old-breakpoints', 'iob', '!' );
1363 ########################################
1364 $category = 8; # Blank line control
1365 ########################################
1366 $add_option->( 'blanks-before-blocks', 'bbb', '!' );
1367 $add_option->( 'blanks-before-comments', 'bbc', '!' );
1368 $add_option->( 'blanks-before-subs', 'bbs', '!' );
1369 $add_option->( 'long-block-line-count', 'lbl', '=i' );
1370 $add_option->( 'maximum-consecutive-blank-lines', 'mbl', '=i' );
1371 $add_option->( 'swallow-optional-blank-lines', 'sob', '!' );
1373 ########################################
1374 $category = 9; # Other controls
1375 ########################################
1376 $add_option->( 'delete-block-comments', 'dbc', '!' );
1377 $add_option->( 'delete-closing-side-comments', 'dcsc', '!' );
1378 $add_option->( 'delete-pod', 'dp', '!' );
1379 $add_option->( 'delete-side-comments', 'dsc', '!' );
1380 $add_option->( 'tee-block-comments', 'tbc', '!' );
1381 $add_option->( 'tee-pod', 'tp', '!' );
1382 $add_option->( 'tee-side-comments', 'tsc', '!' );
1383 $add_option->( 'look-for-autoloader', 'lal', '!' );
1384 $add_option->( 'look-for-hash-bang', 'x', '!' );
1385 $add_option->( 'look-for-selfloader', 'lsl', '!' );
1386 $add_option->( 'pass-version-line', 'pvl', '!' );
1388 ########################################
1389 $category = 13; # Debugging
1390 ########################################
1391 $add_option->( 'DEBUG', 'D', '!' );
1392 $add_option->( 'DIAGNOSTICS', 'I', '!' );
1393 $add_option->( 'check-multiline-quotes', 'chk', '!' );
1394 $add_option->( 'dump-defaults', 'ddf', '!' );
1395 $add_option->( 'dump-long-names', 'dln', '!' );
1396 $add_option->( 'dump-options', 'dop', '!' );
1397 $add_option->( 'dump-profile', 'dpro', '!' );
1398 $add_option->( 'dump-short-names', 'dsn', '!' );
1399 $add_option->( 'dump-token-types', 'dtt', '!' );
1400 $add_option->( 'dump-want-left-space', 'dwls', '!' );
1401 $add_option->( 'dump-want-right-space', 'dwrs', '!' );
1402 $add_option->( 'fuzzy-line-length', 'fll', '!' );
1403 $add_option->( 'help', 'h', '' );
1404 $add_option->( 'short-concatenation-item-length', 'scl', '=i' );
1405 $add_option->( 'show-options', 'opt', '!' );
1406 $add_option->( 'version', 'v', '' );
1408 #---------------------------------------------------------------------
1410 # The Perl::Tidy::HtmlWriter will add its own options to the string
1411 Perl::Tidy::HtmlWriter->make_getopt_long_names( \@option_string );
1413 ########################################
1414 # Set categories 10, 11, 12
1415 ########################################
1416 # Based on their known order
1417 $category = 12; # HTML properties
1418 foreach my $opt (@option_string) {
1419 my $long_name = $opt;
1420 $long_name =~ s/(!|=.*|:.*)$//;
1421 unless ( defined( $option_category{$long_name} ) ) {
1422 if ( $long_name =~ /^html-linked/ ) {
1423 $category = 10; # HTML options
1425 elsif ( $long_name =~ /^pod2html/ ) {
1426 $category = 11; # Pod2html
1428 $option_category{$long_name} = $category_name[$category];
1432 #---------------------------------------------------------------
1433 # Assign valid ranges to certain options
1434 #---------------------------------------------------------------
1435 # In the future, these may be used to make preliminary checks
1436 # hash keys are long names
1437 # If key or value is undefined:
1438 # strings may have any value
1439 # integer ranges are >=0
1440 # If value is defined:
1441 # value is [qw(any valid words)] for strings
1442 # value is [min, max] for integers
1443 # if min is undefined, there is no lower limit
1444 # if max is undefined, there is no upper limit
1445 # Parameters not listed here have defaults
1447 'format' => [ 'tidy', 'html', 'user' ],
1448 'output-line-ending' => [ 'dos', 'win', 'mac', 'unix' ],
1450 'block-brace-tightness' => [ 0, 2 ],
1451 'brace-tightness' => [ 0, 2 ],
1452 'paren-tightness' => [ 0, 2 ],
1453 'square-bracket-tightness' => [ 0, 2 ],
1455 'block-brace-vertical-tightness' => [ 0, 2 ],
1456 'brace-vertical-tightness' => [ 0, 2 ],
1457 'brace-vertical-tightness-closing' => [ 0, 2 ],
1458 'paren-vertical-tightness' => [ 0, 2 ],
1459 'paren-vertical-tightness-closing' => [ 0, 2 ],
1460 'square-bracket-vertical-tightness' => [ 0, 2 ],
1461 'square-bracket-vertical-tightness-closing' => [ 0, 2 ],
1462 'vertical-tightness' => [ 0, 2 ],
1463 'vertical-tightness-closing' => [ 0, 2 ],
1465 'closing-brace-indentation' => [ 0, 3 ],
1466 'closing-paren-indentation' => [ 0, 3 ],
1467 'closing-square-bracket-indentation' => [ 0, 3 ],
1468 'closing-token-indentation' => [ 0, 3 ],
1470 'closing-side-comment-else-flag' => [ 0, 2 ],
1471 'comma-arrow-breakpoints' => [ 0, 3 ],
1474 # Note: we could actually allow negative ci if someone really wants it:
1475 # $option_range{'continuation-indentation'} = [ undef, undef ];
1477 #---------------------------------------------------------------
1478 # Assign default values to the above options here, except
1479 # for 'outfile' and 'help'.
1480 # These settings should approximate the perlstyle(1) suggestions.
1481 #---------------------------------------------------------------
1486 blanks-before-blocks
1487 blanks-before-comments
1489 block-brace-tightness=0
1490 block-brace-vertical-tightness=0
1492 brace-vertical-tightness-closing=0
1493 brace-vertical-tightness=0
1494 break-at-old-logical-breakpoints
1495 break-at-old-ternary-breakpoints
1496 break-at-old-keyword-breakpoints
1497 comma-arrow-breakpoints=1
1499 closing-side-comment-interval=6
1500 closing-side-comment-maximum-text=20
1501 closing-side-comment-else-flag=0
1502 closing-paren-indentation=0
1503 closing-brace-indentation=0
1504 closing-square-bracket-indentation=0
1505 continuation-indentation=2
1509 hanging-side-comments
1510 indent-block-comments
1512 long-block-line-count=8
1515 maximum-consecutive-blank-lines=1
1516 maximum-fields-per-table=0
1517 maximum-line-length=80
1518 minimum-space-to-comment=4
1519 nobrace-left-and-indent
1521 nodelete-old-whitespace
1526 nostatic-side-comments
1527 noswallow-optional-blank-lines
1532 outdent-long-comments
1534 paren-vertical-tightness-closing=0
1535 paren-vertical-tightness=0
1539 short-concatenation-item-length=8
1541 square-bracket-tightness=1
1542 square-bracket-vertical-tightness-closing=0
1543 square-bracket-vertical-tightness=0
1544 static-block-comments
1547 backup-file-extension=bak
1551 html-table-of-contents
1555 push @defaults, "perl-syntax-check-flags=-c -T";
1557 #---------------------------------------------------------------
1558 # Define abbreviations which will be expanded into the above primitives.
1559 # These may be defined recursively.
1560 #---------------------------------------------------------------
1563 'freeze-newlines' => [qw(noadd-newlines nodelete-old-newlines)],
1564 'fnl' => [qw(freeze-newlines)],
1565 'freeze-whitespace' => [qw(noadd-whitespace nodelete-old-whitespace)],
1566 'fws' => [qw(freeze-whitespace)],
1567 'indent-only' => [qw(freeze-newlines freeze-whitespace)],
1568 'outdent-long-lines' => [qw(outdent-long-quotes outdent-long-comments)],
1569 'nooutdent-long-lines' =>
1570 [qw(nooutdent-long-quotes nooutdent-long-comments)],
1571 'noll' => [qw(nooutdent-long-lines)],
1572 'io' => [qw(indent-only)],
1573 'delete-all-comments' =>
1574 [qw(delete-block-comments delete-side-comments delete-pod)],
1575 'nodelete-all-comments' =>
1576 [qw(nodelete-block-comments nodelete-side-comments nodelete-pod)],
1577 'dac' => [qw(delete-all-comments)],
1578 'ndac' => [qw(nodelete-all-comments)],
1579 'gnu' => [qw(gnu-style)],
1580 'pbp' => [qw(perl-best-practices)],
1581 'tee-all-comments' =>
1582 [qw(tee-block-comments tee-side-comments tee-pod)],
1583 'notee-all-comments' =>
1584 [qw(notee-block-comments notee-side-comments notee-pod)],
1585 'tac' => [qw(tee-all-comments)],
1586 'ntac' => [qw(notee-all-comments)],
1587 'html' => [qw(format=html)],
1588 'nhtml' => [qw(format=tidy)],
1589 'tidy' => [qw(format=tidy)],
1591 'break-after-comma-arrows' => [qw(cab=0)],
1592 'nobreak-after-comma-arrows' => [qw(cab=1)],
1593 'baa' => [qw(cab=0)],
1594 'nbaa' => [qw(cab=1)],
1596 'break-at-old-trinary-breakpoints' => [qw(bot)],
1598 'cti=0' => [qw(cpi=0 cbi=0 csbi=0)],
1599 'cti=1' => [qw(cpi=1 cbi=1 csbi=1)],
1600 'cti=2' => [qw(cpi=2 cbi=2 csbi=2)],
1601 'icp' => [qw(cpi=2 cbi=2 csbi=2)],
1602 'nicp' => [qw(cpi=0 cbi=0 csbi=0)],
1604 'closing-token-indentation=0' => [qw(cpi=0 cbi=0 csbi=0)],
1605 'closing-token-indentation=1' => [qw(cpi=1 cbi=1 csbi=1)],
1606 'closing-token-indentation=2' => [qw(cpi=2 cbi=2 csbi=2)],
1607 'indent-closing-paren' => [qw(cpi=2 cbi=2 csbi=2)],
1608 'noindent-closing-paren' => [qw(cpi=0 cbi=0 csbi=0)],
1610 'vt=0' => [qw(pvt=0 bvt=0 sbvt=0)],
1611 'vt=1' => [qw(pvt=1 bvt=1 sbvt=1)],
1612 'vt=2' => [qw(pvt=2 bvt=2 sbvt=2)],
1614 'vertical-tightness=0' => [qw(pvt=0 bvt=0 sbvt=0)],
1615 'vertical-tightness=1' => [qw(pvt=1 bvt=1 sbvt=1)],
1616 'vertical-tightness=2' => [qw(pvt=2 bvt=2 sbvt=2)],
1618 'vtc=0' => [qw(pvtc=0 bvtc=0 sbvtc=0)],
1619 'vtc=1' => [qw(pvtc=1 bvtc=1 sbvtc=1)],
1620 'vtc=2' => [qw(pvtc=2 bvtc=2 sbvtc=2)],
1622 'vertical-tightness-closing=0' => [qw(pvtc=0 bvtc=0 sbvtc=0)],
1623 'vertical-tightness-closing=1' => [qw(pvtc=1 bvtc=1 sbvtc=1)],
1624 'vertical-tightness-closing=2' => [qw(pvtc=2 bvtc=2 sbvtc=2)],
1626 'otr' => [qw(opr ohbr osbr)],
1627 'opening-token-right' => [qw(opr ohbr osbr)],
1628 'notr' => [qw(nopr nohbr nosbr)],
1629 'noopening-token-right' => [qw(nopr nohbr nosbr)],
1631 'sot' => [qw(sop sohb sosb)],
1632 'nsot' => [qw(nsop nsohb nsosb)],
1633 'stack-opening-tokens' => [qw(sop sohb sosb)],
1634 'nostack-opening-tokens' => [qw(nsop nsohb nsosb)],
1636 'sct' => [qw(scp schb scsb)],
1637 'stack-closing-tokens' => => [qw(scp schb scsb)],
1638 'nsct' => [qw(nscp nschb nscsb)],
1639 'nostack-opening-tokens' => [qw(nscp nschb nscsb)],
1641 # 'mangle' originally deleted pod and comments, but to keep it
1642 # reversible, it no longer does. But if you really want to
1643 # delete them, just use:
1646 # An interesting use for 'mangle' is to do this:
1647 # perltidy -mangle myfile.pl -st | perltidy -o myfile.pl.new
1648 # which will form as many one-line blocks as possible
1654 delete-old-whitespace
1657 maximum-consecutive-blank-lines=0
1658 maximum-line-length=100000
1662 noblanks-before-blocks
1663 noblanks-before-subs
1668 # 'extrude' originally deleted pod and comments, but to keep it
1669 # reversible, it no longer does. But if you really want to
1670 # delete them, just use
1673 # An interesting use for 'extrude' is to do this:
1674 # perltidy -extrude myfile.pl -st | perltidy -o myfile.pl.new
1675 # which will break up all one-line blocks.
1682 delete-old-whitespace
1685 maximum-consecutive-blank-lines=0
1686 maximum-line-length=1
1689 noblanks-before-blocks
1690 noblanks-before-subs
1697 # this style tries to follow the GNU Coding Standards (which do
1698 # not really apply to perl but which are followed by some perl
1702 lp bl noll pt=2 bt=2 sbt=2 cpi=1 csbi=1 cbi=1
1706 # Style suggested in Damian Conway's Perl Best Practices
1707 'perl-best-practices' => [
1708 qw(l=78 i=4 ci=4 st se vt=2 cti=0 pt=1 bt=1 sbt=1 bbt=1 nsfs nolq),
1709 q(wbb=% + - * / x != == >= <= =~ !~ < > | & = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=)
1712 # Additional styles can be added here
1715 Perl::Tidy::HtmlWriter->make_abbreviated_names( \%expansion );
1717 # Uncomment next line to dump all expansions for debugging:
1718 # dump_short_names(\%expansion);
1720 \@option_string, \@defaults, \%expansion,
1721 \%option_category, \%option_range
1724 } # end of generate_options
1726 sub process_command_line {
1729 $perltidyrc_stream, $is_Windows, $Windows_type,
1730 $rpending_complaint, $dump_options_type
1736 $roption_string, $rdefaults, $rexpansion,
1737 $roption_category, $roption_range
1738 ) = generate_options();
1740 #---------------------------------------------------------------
1741 # set the defaults by passing the above list through GetOptions
1742 #---------------------------------------------------------------
1748 # do not load the defaults if we are just dumping perltidyrc
1749 unless ( $dump_options_type eq 'perltidyrc' ) {
1750 for $i (@$rdefaults) { push @ARGV, "--" . $i }
1753 # Patch to save users Getopt::Long configuration
1754 # and set to Getopt::Long defaults. Use eval to avoid
1755 # breaking old versions of Perl without these routines.
1757 eval { $glc = Getopt::Long::Configure() };
1759 eval { Getopt::Long::ConfigDefaults() };
1761 else { $glc = undef }
1763 if ( !GetOptions( \%Opts, @$roption_string ) ) {
1764 die "Programming Bug: error in setting default options";
1767 # Patch to put the previous Getopt::Long configuration back
1768 eval { Getopt::Long::Configure($glc) } if defined $glc;
1772 my @raw_options = ();
1773 my $config_file = "";
1774 my $saw_ignore_profile = 0;
1775 my $saw_extrude = 0;
1776 my $saw_dump_profile = 0;
1779 #---------------------------------------------------------------
1780 # Take a first look at the command-line parameters. Do as many
1781 # immediate dumps as possible, which can avoid confusion if the
1782 # perltidyrc file has an error.
1783 #---------------------------------------------------------------
1784 foreach $i (@ARGV) {
1787 if ( $i =~ /^-(npro|noprofile|no-profile)$/ ) {
1788 $saw_ignore_profile = 1;
1791 # note: this must come before -pro and -profile, below:
1792 elsif ( $i =~ /^-(dump-profile|dpro)$/ ) {
1793 $saw_dump_profile = 1;
1795 elsif ( $i =~ /^-(pro|profile)=(.+)/ ) {
1798 "Only one -pro=filename allowed, using '$2' instead of '$config_file'\n";
1801 unless ( -e $config_file ) {
1802 warn "cannot find file given with -pro=$config_file: $!\n";
1806 elsif ( $i =~ /^-(pro|profile)=?$/ ) {
1807 die "usage: -pro=filename or --profile=filename, no spaces\n";
1809 elsif ( $i =~ /^-extrude$/ ) {
1812 elsif ( $i =~ /^-(help|h|HELP|H)$/ ) {
1816 elsif ( $i =~ /^-(version|v)$/ ) {
1820 elsif ( $i =~ /^-(dump-defaults|ddf)$/ ) {
1821 dump_defaults(@$rdefaults);
1824 elsif ( $i =~ /^-(dump-long-names|dln)$/ ) {
1825 dump_long_names(@$roption_string);
1828 elsif ( $i =~ /^-(dump-short-names|dsn)$/ ) {
1829 dump_short_names($rexpansion);
1832 elsif ( $i =~ /^-(dump-token-types|dtt)$/ ) {
1833 Perl::Tidy::Tokenizer->dump_token_types(*STDOUT);
1838 if ( $saw_dump_profile && $saw_ignore_profile ) {
1839 warn "No profile to dump because of -npro\n";
1843 #---------------------------------------------------------------
1844 # read any .perltidyrc configuration file
1845 #---------------------------------------------------------------
1846 unless ($saw_ignore_profile) {
1848 # resolve possible conflict between $perltidyrc_stream passed
1849 # as call parameter to perltidy and -pro=filename on command
1851 if ($perltidyrc_stream) {
1854 Conflict: a perltidyrc configuration file was specified both as this
1855 perltidy call parameter: $perltidyrc_stream
1856 and with this -profile=$config_file.
1857 Using -profile=$config_file.
1861 $config_file = $perltidyrc_stream;
1865 # look for a config file if we don't have one yet
1866 my $rconfig_file_chatter;
1867 $$rconfig_file_chatter = "";
1869 find_config_file( $is_Windows, $Windows_type, $rconfig_file_chatter,
1870 $rpending_complaint )
1871 unless $config_file;
1873 # open any config file
1876 ( $fh_config, $config_file ) =
1877 Perl::Tidy::streamhandle( $config_file, 'r' );
1878 unless ($fh_config) {
1879 $$rconfig_file_chatter .=
1880 "# $config_file exists but cannot be opened\n";
1884 if ($saw_dump_profile) {
1885 if ($saw_dump_profile) {
1886 dump_config_file( $fh_config, $config_file,
1887 $rconfig_file_chatter );
1894 my ( $rconfig_list, $death_message ) =
1895 read_config_file( $fh_config, $config_file, $rexpansion );
1896 die $death_message if ($death_message);
1898 # process any .perltidyrc parameters right now so we can
1900 if (@$rconfig_list) {
1901 local @ARGV = @$rconfig_list;
1903 expand_command_abbreviations( $rexpansion, \@raw_options,
1906 if ( !GetOptions( \%Opts, @$roption_string ) ) {
1908 "Error in this config file: $config_file \nUse -npro to ignore this file, -h for help'\n";
1911 # Anything left in this local @ARGV is an error and must be
1912 # invalid bare words from the configuration file. We cannot
1913 # check this earlier because bare words may have been valid
1914 # values for parameters. We had to wait for GetOptions to have
1918 my $str = "\'" . pop(@ARGV) . "\'";
1919 while ( my $param = pop(@ARGV) ) {
1920 if ( length($str) < 70 ) {
1921 $str .= ", '$param'";
1929 There are $count unrecognized values in the configuration file '$config_file':
1931 Use leading dashes for parameters. Use -npro to ignore this file.
1935 # Undo any options which cause premature exit. They are not
1936 # appropriate for a config file, and it could be hard to
1937 # diagnose the cause of the premature exit.
1946 dump-want-left-space
1947 dump-want-right-space
1955 if ( defined( $Opts{$_} ) ) {
1957 warn "ignoring --$_ in config file: $config_file\n";
1964 #---------------------------------------------------------------
1965 # now process the command line parameters
1966 #---------------------------------------------------------------
1967 expand_command_abbreviations( $rexpansion, \@raw_options, $config_file );
1969 if ( !GetOptions( \%Opts, @$roption_string ) ) {
1970 die "Error on command line; for help try 'perltidy -h'\n";
1973 return ( \%Opts, $config_file, \@raw_options, $saw_extrude, $roption_string,
1974 $rexpansion, $roption_category, $roption_range );
1975 } # end of process_command_line
1979 my ( $rOpts, $is_Windows, $Windows_type, $rpending_complaint ) = @_;
1981 #---------------------------------------------------------------
1982 # check and handle any interactions among the basic options..
1983 #---------------------------------------------------------------
1985 # Since -vt, -vtc, and -cti are abbreviations, but under
1986 # msdos, an unquoted input parameter like vtc=1 will be
1987 # seen as 2 parameters, vtc and 1, so the abbreviations
1988 # won't be seen. Therefore, we will catch them here if
1991 if ( defined $rOpts->{'vertical-tightness'} ) {
1992 my $vt = $rOpts->{'vertical-tightness'};
1993 $rOpts->{'paren-vertical-tightness'} = $vt;
1994 $rOpts->{'square-bracket-vertical-tightness'} = $vt;
1995 $rOpts->{'brace-vertical-tightness'} = $vt;
1998 if ( defined $rOpts->{'vertical-tightness-closing'} ) {
1999 my $vtc = $rOpts->{'vertical-tightness-closing'};
2000 $rOpts->{'paren-vertical-tightness-closing'} = $vtc;
2001 $rOpts->{'square-bracket-vertical-tightness-closing'} = $vtc;
2002 $rOpts->{'brace-vertical-tightness-closing'} = $vtc;
2005 if ( defined $rOpts->{'closing-token-indentation'} ) {
2006 my $cti = $rOpts->{'closing-token-indentation'};
2007 $rOpts->{'closing-square-bracket-indentation'} = $cti;
2008 $rOpts->{'closing-brace-indentation'} = $cti;
2009 $rOpts->{'closing-paren-indentation'} = $cti;
2012 # In quiet mode, there is no log file and hence no way to report
2013 # results of syntax check, so don't do it.
2014 if ( $rOpts->{'quiet'} ) {
2015 $rOpts->{'check-syntax'} = 0;
2018 # can't check syntax if no output
2019 if ( $rOpts->{'format'} ne 'tidy' ) {
2020 $rOpts->{'check-syntax'} = 0;
2023 # Never let Windows 9x/Me systems run syntax check -- this will prevent a
2024 # wide variety of nasty problems on these systems, because they cannot
2025 # reliably run backticks. Don't even think about changing this!
2026 if ( $rOpts->{'check-syntax'}
2028 && ( !$Windows_type || $Windows_type =~ /^(9|Me)/ ) )
2030 $rOpts->{'check-syntax'} = 0;
2033 # It's really a bad idea to check syntax as root unless you wrote
2034 # the script yourself. FIXME: not sure if this works with VMS
2035 unless ($is_Windows) {
2037 if ( $< == 0 && $rOpts->{'check-syntax'} ) {
2038 $rOpts->{'check-syntax'} = 0;
2039 $$rpending_complaint .=
2040 "Syntax check deactivated for safety; you shouldn't run this as root\n";
2044 # see if user set a non-negative logfile-gap
2045 if ( defined( $rOpts->{'logfile-gap'} ) && $rOpts->{'logfile-gap'} >= 0 ) {
2047 # a zero gap will be taken as a 1
2048 if ( $rOpts->{'logfile-gap'} == 0 ) {
2049 $rOpts->{'logfile-gap'} = 1;
2052 # setting a non-negative logfile gap causes logfile to be saved
2053 $rOpts->{'logfile'} = 1;
2056 # not setting logfile gap, or setting it negative, causes default of 50
2058 $rOpts->{'logfile-gap'} = 50;
2061 # set short-cut flag when only indentation is to be done.
2062 # Note that the user may or may not have already set the
2064 if ( !$rOpts->{'add-whitespace'}
2065 && !$rOpts->{'delete-old-whitespace'}
2066 && !$rOpts->{'add-newlines'}
2067 && !$rOpts->{'delete-old-newlines'} )
2069 $rOpts->{'indent-only'} = 1;
2072 # -isbc implies -ibc
2073 if ( $rOpts->{'indent-spaced-block-comments'} ) {
2074 $rOpts->{'indent-block-comments'} = 1;
2077 # -bli flag implies -bl
2078 if ( $rOpts->{'brace-left-and-indent'} ) {
2079 $rOpts->{'opening-brace-on-new-line'} = 1;
2082 if ( $rOpts->{'opening-brace-always-on-right'}
2083 && $rOpts->{'opening-brace-on-new-line'} )
2086 Conflict: you specified both 'opening-brace-always-on-right' (-bar) and
2087 'opening-brace-on-new-line' (-bl). Ignoring -bl.
2089 $rOpts->{'opening-brace-on-new-line'} = 0;
2092 # it simplifies things if -bl is 0 rather than undefined
2093 if ( !defined( $rOpts->{'opening-brace-on-new-line'} ) ) {
2094 $rOpts->{'opening-brace-on-new-line'} = 0;
2097 # -sbl defaults to -bl if not defined
2098 if ( !defined( $rOpts->{'opening-sub-brace-on-new-line'} ) ) {
2099 $rOpts->{'opening-sub-brace-on-new-line'} =
2100 $rOpts->{'opening-brace-on-new-line'};
2103 # set shortcut flag if no blanks to be written
2104 unless ( $rOpts->{'maximum-consecutive-blank-lines'} ) {
2105 $rOpts->{'swallow-optional-blank-lines'} = 1;
2108 if ( $rOpts->{'entab-leading-whitespace'} ) {
2109 if ( $rOpts->{'entab-leading-whitespace'} < 0 ) {
2110 warn "-et=n must use a positive integer; ignoring -et\n";
2111 $rOpts->{'entab-leading-whitespace'} = undef;
2114 # entab leading whitespace has priority over the older 'tabs' option
2115 if ( $rOpts->{'tabs'} ) { $rOpts->{'tabs'} = 0; }
2119 sub expand_command_abbreviations {
2121 # go through @ARGV and expand any abbreviations
2123 my ( $rexpansion, $rraw_options, $config_file ) = @_;
2126 # set a pass limit to prevent an infinite loop;
2127 # 10 should be plenty, but it may be increased to allow deeply
2128 # nested expansions.
2129 my $max_passes = 10;
2132 # keep looping until all expansions have been converted into actual
2134 for ( my $pass_count = 0 ; $pass_count <= $max_passes ; $pass_count++ ) {
2136 my $abbrev_count = 0;
2138 # loop over each item in @ARGV..
2139 foreach $word (@ARGV) {
2141 # convert any leading 'no-' to just 'no'
2142 if ( $word =~ /^(-[-]?no)-(.*)/ ) { $word = $1 . $2 }
2144 # if it is a dash flag (instead of a file name)..
2145 if ( $word =~ /^-[-]?([\w\-]+)(.*)/ ) {
2150 # save the raw input for debug output in case of circular refs
2151 if ( $pass_count == 0 ) {
2152 push( @$rraw_options, $word );
2155 # recombine abbreviation and flag, if necessary,
2156 # to allow abbreviations with arguments such as '-vt=1'
2157 if ( $rexpansion->{ $abr . $flags } ) {
2158 $abr = $abr . $flags;
2162 # if we see this dash item in the expansion hash..
2163 if ( $rexpansion->{$abr} ) {
2166 # stuff all of the words that it expands to into the
2167 # new arg list for the next pass
2168 foreach my $abbrev ( @{ $rexpansion->{$abr} } ) {
2169 next unless $abbrev; # for safety; shouldn't happen
2170 push( @new_argv, '--' . $abbrev . $flags );
2174 # not in expansion hash, must be actual long name
2176 push( @new_argv, $word );
2180 # not a dash item, so just save it for the next pass
2182 push( @new_argv, $word );
2184 } # end of this pass
2186 # update parameter list @ARGV to the new one
2188 last unless ( $abbrev_count > 0 );
2190 # make sure we are not in an infinite loop
2191 if ( $pass_count == $max_passes ) {
2193 "I'm tired. We seem to be in an infinite loop trying to expand aliases.\n";
2194 print STDERR "Here are the raw options\n";
2196 print STDERR "(@$rraw_options)\n";
2197 my $num = @new_argv;
2200 print STDERR "After $max_passes passes here is ARGV\n";
2201 print STDERR "(@new_argv)\n";
2204 print STDERR "After $max_passes passes ARGV has $num entries\n";
2209 Please check your configuration file $config_file for circular-references.
2210 To deactivate it, use -npro.
2215 Program bug - circular-references in the %expansion hash, probably due to
2216 a recent program change.
2219 } # end of check for circular references
2220 } # end of loop over all passes
2223 # Debug routine -- this will dump the expansion hash
2224 sub dump_short_names {
2225 my $rexpansion = shift;
2227 List of short names. This list shows how all abbreviations are
2228 translated into other abbreviations and, eventually, into long names.
2229 New abbreviations may be defined in a .perltidyrc file.
2230 For a list of all long names, use perltidy --dump-long-names (-dln).
2231 --------------------------------------------------------------------------
2233 foreach my $abbrev ( sort keys %$rexpansion ) {
2234 my @list = @{ $$rexpansion{$abbrev} };
2235 print STDOUT "$abbrev --> @list\n";
2239 sub check_vms_filename {
2241 # given a valid filename (the perltidy input file)
2242 # create a modified filename and separator character
2245 # Contributed by Michael Cartmell
2247 my ( $base, $path ) = fileparse( $_[0] );
2249 # remove explicit ; version
2250 $base =~ s/;-?\d*$//
2252 # remove explicit . version ie two dots in filename NB ^ escapes a dot
2253 or $base =~ s/( # begin capture $1
2254 (?:^|[^^])\. # match a dot not preceded by a caret
2255 (?: # followed by nothing
2257 .*[^^] # anything ending in a non caret
2260 \.-?\d*$ # match . version number
2263 # normalise filename, if there are no unescaped dots then append one
2264 $base .= '.' unless $base =~ /(?:^|[^^])\./;
2266 # if we don't already have an extension then we just append the extention
2267 my $separator = ( $base =~ /\.$/ ) ? "" : "_";
2268 return ( $path . $base, $separator );
2273 # TODO: are these more standard names?
2274 # Win32s Win95 Win98 WinMe WinNT3.51 WinNT4 Win2000 WinXP/.Net Win2003
2276 # Returns a string that determines what MS OS we are on.
2277 # Returns win32s,95,98,Me,NT3.51,NT4,2000,XP/.Net,Win2003
2278 # Returns blank string if not an MS system.
2279 # Original code contributed by: Yves Orton
2280 # We need to know this to decide where to look for config files
2282 my $rpending_complaint = shift;
2284 return $os unless $^O =~ /win32|dos/i; # is it a MS box?
2286 # Systems built from Perl source may not have Win32.pm
2287 # But probably have Win32::GetOSVersion() anyway so the
2288 # following line is not 'required':
2289 # return $os unless eval('require Win32');
2291 # Use the standard API call to determine the version
2292 my ( $undef, $major, $minor, $build, $id );
2293 eval { ( $undef, $major, $minor, $build, $id ) = Win32::GetOSVersion() };
2296 # NAME ID MAJOR MINOR
2297 # Windows NT 4 2 4 0
2298 # Windows 2000 2 5 0
2300 # Windows Server 2003 2 5 2
2302 return "win32s" unless $id; # If id==0 then its a win32s box.
2303 $os = { # Magic numbers from MSDN
2304 # documentation of GetOSVersion
2311 0 => "2000", # or NT 4, see below
2318 # If $os is undefined, the above code is out of date. Suggested updates
2320 unless ( defined $os ) {
2322 $$rpending_complaint .= <<EOS;
2323 Error trying to discover Win_OS_Type: $id:$major:$minor Has no name of record!
2324 We won't be able to look for a system-wide config file.
2328 # Unfortunately the logic used for the various versions isnt so clever..
2329 # so we have to handle an outside case.
2330 return ( $os eq "2000" && $major != 5 ) ? "NT4" : $os;
2334 return ( $^O !~ /win32|dos/i )
2337 && ( $^O ne 'MacOS' );
2340 sub look_for_Windows {
2342 # determine Windows sub-type and location of
2343 # system-wide configuration files
2344 my $rpending_complaint = shift;
2345 my $is_Windows = ( $^O =~ /win32|dos/i );
2346 my $Windows_type = Win_OS_Type($rpending_complaint) if $is_Windows;
2347 return ( $is_Windows, $Windows_type );
2350 sub find_config_file {
2352 # look for a .perltidyrc configuration file
2353 my ( $is_Windows, $Windows_type, $rconfig_file_chatter,
2354 $rpending_complaint ) = @_;
2356 $$rconfig_file_chatter .= "# Config file search...system reported as:";
2358 $$rconfig_file_chatter .= "Windows $Windows_type\n";
2361 $$rconfig_file_chatter .= " $^O\n";
2364 # sub to check file existance and record all tests
2365 my $exists_config_file = sub {
2366 my $config_file = shift;
2367 return 0 unless $config_file;
2368 $$rconfig_file_chatter .= "# Testing: $config_file\n";
2369 return -f $config_file;
2374 # look in current directory first
2375 $config_file = ".perltidyrc";
2376 return $config_file if $exists_config_file->($config_file);
2378 # Default environment vars.
2379 my @envs = qw(PERLTIDY HOME);
2381 # Check the NT/2k/XP locations, first a local machine def, then a
2383 push @envs, qw(USERPROFILE HOMESHARE) if $^O =~ /win32/i;
2385 # Now go through the enviornment ...
2386 foreach my $var (@envs) {
2387 $$rconfig_file_chatter .= "# Examining: \$ENV{$var}";
2388 if ( defined( $ENV{$var} ) ) {
2389 $$rconfig_file_chatter .= " = $ENV{$var}\n";
2391 # test ENV{ PERLTIDY } as file:
2392 if ( $var eq 'PERLTIDY' ) {
2393 $config_file = "$ENV{$var}";
2394 return $config_file if $exists_config_file->($config_file);
2397 # test ENV as directory:
2398 $config_file = catfile( $ENV{$var}, ".perltidyrc" );
2399 return $config_file if $exists_config_file->($config_file);
2402 $$rconfig_file_chatter .= "\n";
2406 # then look for a system-wide definition
2407 # where to look varies with OS
2410 if ($Windows_type) {
2411 my ( $os, $system, $allusers ) =
2412 Win_Config_Locs( $rpending_complaint, $Windows_type );
2414 # Check All Users directory, if there is one.
2416 $config_file = catfile( $allusers, ".perltidyrc" );
2417 return $config_file if $exists_config_file->($config_file);
2420 # Check system directory.
2421 $config_file = catfile( $system, ".perltidyrc" );
2422 return $config_file if $exists_config_file->($config_file);
2426 # Place to add customization code for other systems
2427 elsif ( $^O eq 'OS2' ) {
2429 elsif ( $^O eq 'MacOS' ) {
2431 elsif ( $^O eq 'VMS' ) {
2434 # Assume some kind of Unix
2437 $config_file = "/usr/local/etc/perltidyrc";
2438 return $config_file if $exists_config_file->($config_file);
2440 $config_file = "/etc/perltidyrc";
2441 return $config_file if $exists_config_file->($config_file);
2444 # Couldn't find a config file
2448 sub Win_Config_Locs {
2450 # In scalar context returns the OS name (95 98 ME NT3.51 NT4 2000 XP),
2451 # or undef if its not a win32 OS. In list context returns OS, System
2452 # Directory, and All Users Directory. All Users will be empty on a
2453 # 9x/Me box. Contributed by: Yves Orton.
2455 my $rpending_complaint = shift;
2456 my $os = (@_) ? shift : Win_OS_Type();
2462 if ( $os =~ /9[58]|Me/ ) {
2463 $system = "C:/Windows";
2465 elsif ( $os =~ /NT|XP|200?/ ) {
2466 $system = ( $os =~ /XP/ ) ? "C:/Windows/" : "C:/WinNT/";
2469 ? "C:/WinNT/profiles/All Users/"
2470 : "C:/Documents and Settings/All Users/";
2474 # This currently would only happen on a win32s computer. I dont have
2475 # one to test, so I am unsure how to proceed. Suggestions welcome!
2476 $$rpending_complaint .=
2477 "I dont know a sensible place to look for config files on an $os system.\n";
2480 return wantarray ? ( $os, $system, $allusers ) : $os;
2483 sub dump_config_file {
2485 my $config_file = shift;
2486 my $rconfig_file_chatter = shift;
2487 print STDOUT "$$rconfig_file_chatter";
2489 print STDOUT "# Dump of file: '$config_file'\n";
2490 while ( my $line = $fh->getline() ) { print STDOUT $line }
2491 eval { $fh->close() };
2494 print STDOUT "# ...no config file found\n";
2498 sub read_config_file {
2500 my ( $fh, $config_file, $rexpansion ) = @_;
2501 my @config_list = ();
2503 # file is bad if non-empty $death_message is returned
2504 my $death_message = "";
2508 while ( my $line = $fh->getline() ) {
2511 next if $line =~ /^\s*#/; # skip full-line comment
2512 ( $line, $death_message ) =
2513 strip_comment( $line, $config_file, $line_no );
2514 last if ($death_message);
2515 $line =~ s/^\s*(.*?)\s*$/$1/; # trim both ends
2518 # look for something of the general form
2523 if ( $line =~ /^((\w+)\s*\{)?([^}]*)(\})?$/ ) {
2524 my ( $newname, $body, $curly ) = ( $2, $3, $4 );
2526 # handle a new alias definition
2530 "No '}' seen after $name and before $newname in config file $config_file line $.\n";
2535 if ( ${$rexpansion}{$name} ) {
2537 my @names = sort keys %$rexpansion;
2539 "Here is a list of all installed aliases\n(@names)\n"
2540 . "Attempting to redefine alias ($name) in config file $config_file line $.\n";
2543 ${$rexpansion}{$name} = [];
2549 my ( $rbody_parts, $msg ) = parse_args($body);
2551 $death_message = <<EOM;
2552 Error reading file '$config_file' at line number $line_no.
2554 Please fix this line or use -npro to avoid reading this file
2561 # remove leading dashes if this is an alias
2562 foreach (@$rbody_parts) { s/^\-+//; }
2563 push @{ ${$rexpansion}{$name} }, @$rbody_parts;
2566 push( @config_list, @$rbody_parts );
2573 "Unexpected '}' seen in config file $config_file line $.\n";
2580 eval { $fh->close() };
2581 return ( \@config_list, $death_message );
2586 my ( $instr, $config_file, $line_no ) = @_;
2589 # nothing to do if no comments
2590 if ( $instr !~ /#/ ) {
2591 return ( $instr, $msg );
2594 # use simple method of no quotes
2595 elsif ( $instr !~ /['"]/ ) {
2596 $instr =~ s/\s*\#.*$//; # simple trim
2597 return ( $instr, $msg );
2600 # handle comments and quotes
2602 my $quote_char = "";
2605 # looking for ending quote character
2607 if ( $instr =~ /\G($quote_char)/gc ) {
2611 elsif ( $instr =~ /\G(.)/gc ) {
2615 # error..we reached the end without seeing the ending quote char
2618 Error reading file $config_file at line number $line_no.
2619 Did not see ending quote character <$quote_char> in this text:
2621 Please fix this line or use -npro to avoid reading this file
2627 # accumulating characters and looking for start of a quoted string
2629 if ( $instr =~ /\G([\"\'])/gc ) {
2633 elsif ( $instr =~ /\G#/gc ) {
2636 elsif ( $instr =~ /\G(.)/gc ) {
2644 return ( $outstr, $msg );
2649 # Parse a command string containing multiple string with possible
2650 # quotes, into individual commands. It might look like this, for example:
2652 # -wba=" + - " -some-thing -wbb='. && ||'
2654 # There is no need, at present, to handle escaped quote characters.
2655 # (They are not perltidy tokens, so needn't be in strings).
2658 my @body_parts = ();
2659 my $quote_char = "";
2664 # looking for ending quote character
2666 if ( $body =~ /\G($quote_char)/gc ) {
2669 elsif ( $body =~ /\G(.)/gc ) {
2673 # error..we reached the end without seeing the ending quote char
2675 if ( length($part) ) { push @body_parts, $part; }
2677 Did not see ending quote character <$quote_char> in this text:
2684 # accumulating characters and looking for start of a quoted string
2686 if ( $body =~ /\G([\"\'])/gc ) {
2689 elsif ( $body =~ /\G(\s+)/gc ) {
2690 if ( length($part) ) { push @body_parts, $part; }
2693 elsif ( $body =~ /\G(.)/gc ) {
2697 if ( length($part) ) { push @body_parts, $part; }
2702 return ( \@body_parts, $msg );
2705 sub dump_long_names {
2707 my @names = sort @_;
2709 # Command line long names (passed to GetOptions)
2710 #---------------------------------------------------------------
2711 # here is a summary of the Getopt codes:
2712 # <none> does not take an argument
2713 # =s takes a mandatory string
2714 # :s takes an optional string
2715 # =i takes a mandatory integer
2716 # :i takes an optional integer
2717 # ! does not take an argument and may be negated
2718 # i.e., -foo and -nofoo are allowed
2719 # a double dash signals the end of the options list
2721 #---------------------------------------------------------------
2724 foreach (@names) { print STDOUT "$_\n" }
2728 my @defaults = sort @_;
2729 print STDOUT "Default command line options:\n";
2730 foreach (@_) { print STDOUT "$_\n" }
2735 # write the options back out as a valid .perltidyrc file
2736 my ( $rOpts, $roption_string ) = @_;
2738 my $rGetopt_flags = \%Getopt_flags;
2739 foreach my $opt ( @{$roption_string} ) {
2741 if ( $opt =~ /(.*)(!|=.*)$/ ) {
2745 if ( defined( $rOpts->{$opt} ) ) {
2746 $rGetopt_flags->{$opt} = $flag;
2749 print STDOUT "# Final parameter set for this run:\n";
2750 foreach my $key ( sort keys %{$rOpts} ) {
2751 my $flag = $rGetopt_flags->{$key};
2752 my $value = $rOpts->{$key};
2756 if ( $flag =~ /^=/ ) {
2757 if ( $value !~ /^\d+$/ ) { $value = '"' . $value . '"' }
2758 $suffix = "=" . $value;
2760 elsif ( $flag =~ /^!/ ) {
2761 $prefix .= "no" unless ($value);
2767 "# ERROR in dump_options: unrecognized flag $flag for $key\n";
2770 print STDOUT $prefix . $key . $suffix . "\n";
2776 This is perltidy, v$VERSION
2778 Copyright 2000-2007, Steve Hancock
2780 Perltidy is free software and may be copied under the terms of the GNU
2781 General Public License, which is included in the distribution files.
2783 Complete documentation for perltidy can be found using 'man perltidy'
2784 or on the internet at http://perltidy.sourceforge.net.
2791 This is perltidy version $VERSION, a perl script indenter. Usage:
2793 perltidy [ options ] file1 file2 file3 ...
2794 (output goes to file1.tdy, file2.tdy, file3.tdy, ...)
2795 perltidy [ options ] file1 -o outfile
2796 perltidy [ options ] file1 -st >outfile
2797 perltidy [ options ] <infile >outfile
2799 Options have short and long forms. Short forms are shown; see
2800 man pages for long forms. Note: '=s' indicates a required string,
2801 and '=n' indicates a required integer.
2805 -o=file name of the output file (only if single input file)
2806 -oext=s change output extension from 'tdy' to s
2807 -opath=path change path to be 'path' for output files
2808 -b backup original to .bak and modify file in-place
2809 -bext=s change default backup extension from 'bak' to s
2810 -q deactivate error messages (for running under editor)
2811 -w include non-critical warning messages in the .ERR error output
2812 -syn run perl -c to check syntax (default under unix systems)
2813 -log save .LOG file, which has useful diagnostics
2814 -f force perltidy to read a binary file
2815 -g like -log but writes more detailed .LOG file, for debugging scripts
2816 -opt write the set of options actually used to a .LOG file
2817 -npro ignore .perltidyrc configuration command file
2818 -pro=file read configuration commands from file instead of .perltidyrc
2819 -st send output to standard output, STDOUT
2820 -se send error output to standard error output, STDERR
2821 -v display version number to standard output and quit
2824 -i=n use n columns per indentation level (default n=4)
2825 -t tabs: use one tab character per indentation level, not recommeded
2826 -nt no tabs: use n spaces per indentation level (default)
2827 -et=n entab leading whitespace n spaces per tab; not recommended
2828 -io "indent only": just do indentation, no other formatting.
2829 -sil=n set starting indentation level to n; use if auto detection fails
2830 -ole=s specify output line ending (s=dos or win, mac, unix)
2831 -ple keep output line endings same as input (input must be filename)
2834 -fws freeze whitespace; this disables all whitespace changes
2835 and disables the following switches:
2836 -bt=n sets brace tightness, n= (0 = loose, 1=default, 2 = tight)
2837 -bbt same as -bt but for code block braces; same as -bt if not given
2838 -bbvt block braces vertically tight; use with -bl or -bli
2839 -bbvtl=s make -bbvt to apply to selected list of block types
2840 -pt=n paren tightness (n=0, 1 or 2)
2841 -sbt=n square bracket tightness (n=0, 1, or 2)
2842 -bvt=n brace vertical tightness,
2843 n=(0=open, 1=close unless multiple steps on a line, 2=always close)
2844 -pvt=n paren vertical tightness (see -bvt for n)
2845 -sbvt=n square bracket vertical tightness (see -bvt for n)
2846 -bvtc=n closing brace vertical tightness:
2847 n=(0=open, 1=sometimes close, 2=always close)
2848 -pvtc=n closing paren vertical tightness, see -bvtc for n.
2849 -sbvtc=n closing square bracket vertical tightness, see -bvtc for n.
2850 -ci=n sets continuation indentation=n, default is n=2 spaces
2851 -lp line up parentheses, brackets, and non-BLOCK braces
2852 -sfs add space before semicolon in for( ; ; )
2853 -aws allow perltidy to add whitespace (default)
2854 -dws delete all old non-essential whitespace
2855 -icb indent closing brace of a code block
2856 -cti=n closing indentation of paren, square bracket, or non-block brace:
2857 n=0 none, =1 align with opening, =2 one full indentation level
2858 -icp equivalent to -cti=2
2859 -wls=s want space left of tokens in string; i.e. -nwls='+ - * /'
2860 -wrs=s want space right of tokens in string;
2861 -sts put space before terminal semicolon of a statement
2862 -sak=s put space between keywords given in s and '(';
2863 -nsak=s no space between keywords in s and '('; i.e. -nsak='my our local'
2866 -fnl freeze newlines; this disables all line break changes
2867 and disables the following switches:
2868 -anl add newlines; ok to introduce new line breaks
2869 -bbs add blank line before subs and packages
2870 -bbc add blank line before block comments
2871 -bbb add blank line between major blocks
2872 -sob swallow optional blank lines
2873 -ce cuddled else; use this style: '} else {'
2874 -dnl delete old newlines (default)
2875 -mbl=n maximum consecutive blank lines (default=1)
2876 -l=n maximum line length; default n=80
2877 -bl opening brace on new line
2878 -sbl opening sub brace on new line. value of -bl is used if not given.
2879 -bli opening brace on new line and indented
2880 -bar opening brace always on right, even for long clauses
2881 -vt=n vertical tightness (requires -lp); n controls break after opening
2882 token: 0=never 1=no break if next line balanced 2=no break
2883 -vtc=n vertical tightness of closing container; n controls if closing
2884 token starts new line: 0=always 1=not unless list 1=never
2885 -wba=s want break after tokens in string; i.e. wba=': .'
2886 -wbb=s want break before tokens in string
2888 Following Old Breakpoints
2889 -boc break at old comma breaks: turns off all automatic list formatting
2890 -bol break at old logical breakpoints: or, and, ||, && (default)
2891 -bok break at old list keyword breakpoints such as map, sort (default)
2892 -bot break at old conditional (ternary ?:) operator breakpoints (default)
2893 -cab=n break at commas after a comma-arrow (=>):
2894 n=0 break at all commas after =>
2895 n=1 stable: break unless this breaks an existing one-line container
2896 n=2 break only if a one-line container cannot be formed
2897 n=3 do not treat commas after => specially at all
2900 -ibc indent block comments (default)
2901 -isbc indent spaced block comments; may indent unless no leading space
2902 -msc=n minimum desired spaces to side comment, default 4
2903 -csc add or update closing side comments after closing BLOCK brace
2904 -dcsc delete closing side comments created by a -csc command
2905 -cscp=s change closing side comment prefix to be other than '## end'
2906 -cscl=s change closing side comment to apply to selected list of blocks
2907 -csci=n minimum number of lines needed to apply a -csc tag, default n=6
2908 -csct=n maximum number of columns of appended text, default n=20
2909 -cscw causes warning if old side comment is overwritten with -csc
2911 -sbc use 'static block comments' identified by leading '##' (default)
2912 -sbcp=s change static block comment identifier to be other than '##'
2913 -osbc outdent static block comments
2915 -ssc use 'static side comments' identified by leading '##' (default)
2916 -sscp=s change static side comment identifier to be other than '##'
2918 Delete selected text
2919 -dac delete all comments AND pod
2920 -dbc delete block comments
2921 -dsc delete side comments
2924 Send selected text to a '.TEE' file
2925 -tac tee all comments AND pod
2926 -tbc tee block comments
2927 -tsc tee side comments
2931 -olq outdent long quoted strings (default)
2932 -olc outdent a long block comment line
2933 -ola outdent statement labels
2934 -okw outdent control keywords (redo, next, last, goto, return)
2935 -okwl=s specify alternative keywords for -okw command
2938 -mft=n maximum fields per table; default n=40
2939 -x do not format lines before hash-bang line (i.e., for VMS)
2940 -asc allows perltidy to add a ';' when missing (default)
2941 -dsm allows perltidy to delete an unnecessary ';' (default)
2943 Combinations of other parameters
2944 -gnu attempt to follow GNU Coding Standards as applied to perl
2945 -mangle remove as many newlines as possible (but keep comments and pods)
2946 -extrude insert as many newlines as possible
2948 Dump and die, debugging
2949 -dop dump options used in this run to standard output and quit
2950 -ddf dump default options to standard output and quit
2951 -dsn dump all option short names to standard output and quit
2952 -dln dump option long names to standard output and quit
2953 -dpro dump whatever configuration file is in effect to standard output
2954 -dtt dump all token types to standard output and quit
2957 -html write an html file (see 'man perl2web' for many options)
2958 Note: when -html is used, no indentation or formatting are done.
2959 Hint: try perltidy -html -css=mystyle.css filename.pl
2960 and edit mystyle.css to change the appearance of filename.html.
2961 -nnn gives line numbers
2962 -pre only writes out <pre>..</pre> code section
2963 -toc places a table of contents to subs at the top (default)
2964 -pod passes pod text through pod2html (default)
2965 -frm write html as a frame (3 files)
2966 -text=s extra extension for table of contents if -frm, default='toc'
2967 -sext=s extra extension for file content if -frm, default='src'
2969 A prefix of "n" negates short form toggle switches, and a prefix of "no"
2970 negates the long forms. For example, -nasc means don't add missing
2973 If you are unable to see this entire text, try "perltidy -h | more"
2974 For more detailed information, and additional options, try "man perltidy",
2975 or go to the perltidy home page at http://perltidy.sourceforge.net
2980 sub process_this_file {
2982 my ( $truth, $beauty ) = @_;
2984 # loop to process each line of this file
2985 while ( my $line_of_tokens = $truth->get_line() ) {
2986 $beauty->write_line($line_of_tokens);
2990 eval { $beauty->finish_formatting() };
2991 $truth->report_tokenization_errors();
2996 # Use 'perl -c' to make sure that we did not create bad syntax
2997 # This is a very good independent check for programming errors
2999 # Given names of the input and output files, ($ifname, $ofname),
3000 # we do the following:
3001 # - check syntax of the input file
3002 # - if bad, all done (could be an incomplete code snippet)
3003 # - if infile syntax ok, then check syntax of the output file;
3004 # - if outfile syntax bad, issue warning; this implies a code bug!
3005 # - set and return flag "infile_syntax_ok" : =-1 bad 0 unknown 1 good
3007 my ( $ifname, $ofname, $logger_object, $rOpts ) = @_;
3008 my $infile_syntax_ok = 0;
3009 my $line_of_dashes = '-' x 42 . "\n";
3011 my $flags = $rOpts->{'perl-syntax-check-flags'};
3013 # be sure we invoke perl with -c
3014 # note: perl will accept repeated flags like '-c -c'. It is safest
3015 # to append another -c than try to find an interior bundled c, as
3016 # in -Tc, because such a 'c' might be in a quoted string, for example.
3017 if ( $flags !~ /(^-c|\s+-c)/ ) { $flags .= " -c" }
3019 # be sure we invoke perl with -x if requested
3020 # same comments about repeated parameters applies
3021 if ( $rOpts->{'look-for-hash-bang'} ) {
3022 if ( $flags !~ /(^-x|\s+-x)/ ) { $flags .= " -x" }
3025 # this shouldn't happen unless a termporary file couldn't be made
3026 if ( $ifname eq '-' ) {
3027 $logger_object->write_logfile_entry(
3028 "Cannot run perl -c on STDIN and STDOUT\n");
3029 return $infile_syntax_ok;
3032 $logger_object->write_logfile_entry(
3033 "checking input file syntax with perl $flags\n");
3034 $logger_object->write_logfile_entry($line_of_dashes);
3036 # Not all operating systems/shells support redirection of the standard
3038 my $error_redirection = ( $^O eq 'VMS' ) ? "" : '2>&1';
3040 my $perl_output = do_syntax_check( $ifname, $flags, $error_redirection );
3041 $logger_object->write_logfile_entry("$perl_output\n");
3043 if ( $perl_output =~ /syntax\s*OK/ ) {
3044 $infile_syntax_ok = 1;
3045 $logger_object->write_logfile_entry($line_of_dashes);
3046 $logger_object->write_logfile_entry(
3047 "checking output file syntax with perl $flags ...\n");
3048 $logger_object->write_logfile_entry($line_of_dashes);
3051 do_syntax_check( $ofname, $flags, $error_redirection );
3052 $logger_object->write_logfile_entry("$perl_output\n");
3054 unless ( $perl_output =~ /syntax\s*OK/ ) {
3055 $logger_object->write_logfile_entry($line_of_dashes);
3056 $logger_object->warning(
3057 "The output file has a syntax error when tested with perl $flags $ofname !\n"
3059 $logger_object->warning(
3060 "This implies an error in perltidy; the file $ofname is bad\n");
3061 $logger_object->report_definite_bug();
3063 # the perl version number will be helpful for diagnosing the problem
3064 $logger_object->write_logfile_entry(
3065 qx/perl -v $error_redirection/ . "\n" );
3070 # Only warn of perl -c syntax errors. Other messages,
3071 # such as missing modules, are too common. They can be
3072 # seen by running with perltidy -w
3073 $logger_object->complain("A syntax check using perl $flags gives: \n");
3074 $logger_object->complain($line_of_dashes);
3075 $logger_object->complain("$perl_output\n");
3076 $logger_object->complain($line_of_dashes);
3077 $infile_syntax_ok = -1;
3078 $logger_object->write_logfile_entry($line_of_dashes);
3079 $logger_object->write_logfile_entry(
3080 "The output file will not be checked because of input file problems\n"
3083 return $infile_syntax_ok;
3086 sub do_syntax_check {
3087 my ( $fname, $flags, $error_redirection ) = @_;
3089 # We have to quote the filename in case it has unusual characters
3090 # or spaces. Example: this filename #CM11.pm# gives trouble.
3091 $fname = '"' . $fname . '"';
3093 # Under VMS something like -T will become -t (and an error) so we
3094 # will put quotes around the flags. Double quotes seem to work on
3095 # Unix/Windows/VMS, but this may not work on all systems. (Single
3096 # quotes do not work under Windows). It could become necessary to
3097 # put double quotes around each flag, such as: -"c" -"T"
3098 # We may eventually need some system-dependent coding here.
3099 $flags = '"' . $flags . '"';
3101 # now wish for luck...
3102 return qx/perl $flags $fname $error_redirection/;
3105 #####################################################################
3107 # This is a stripped down version of IO::Scalar
3108 # Given a reference to a scalar, it supplies either:
3109 # a getline method which reads lines (mode='r'), or
3110 # a print method which reads lines (mode='w')
3112 #####################################################################
3113 package Perl::Tidy::IOScalar;
3117 my ( $package, $rscalar, $mode ) = @_;
3118 my $ref = ref $rscalar;
3119 if ( $ref ne 'SCALAR' ) {
3121 ------------------------------------------------------------------------
3122 expecting ref to SCALAR but got ref to ($ref); trace follows:
3123 ------------------------------------------------------------------------
3127 if ( $mode eq 'w' ) {
3129 return bless [ $rscalar, $mode ], $package;
3131 elsif ( $mode eq 'r' ) {
3133 # Convert a scalar to an array.
3134 # This avoids looking for "\n" on each call to getline
3135 my @array = map { $_ .= "\n" } split /\n/, ${$rscalar};
3137 return bless [ \@array, $mode, $i_next ], $package;
3141 ------------------------------------------------------------------------
3142 expecting mode = 'r' or 'w' but got mode ($mode); trace follows:
3143 ------------------------------------------------------------------------
3150 my $mode = $self->[1];
3151 if ( $mode ne 'r' ) {
3153 ------------------------------------------------------------------------
3154 getline call requires mode = 'r' but mode = ($mode); trace follows:
3155 ------------------------------------------------------------------------
3158 my $i = $self->[2]++;
3159 ##my $line = $self->[0]->[$i];
3160 return $self->[0]->[$i];
3165 my $mode = $self->[1];
3166 if ( $mode ne 'w' ) {
3168 ------------------------------------------------------------------------
3169 print call requires mode = 'w' but mode = ($mode); trace follows:
3170 ------------------------------------------------------------------------
3173 ${ $self->[0] } .= $_[0];
3175 sub close { return }
3177 #####################################################################
3179 # This is a stripped down version of IO::ScalarArray
3180 # Given a reference to an array, it supplies either:
3181 # a getline method which reads lines (mode='r'), or
3182 # a print method which reads lines (mode='w')
3184 # NOTE: this routine assumes that that there aren't any embedded
3185 # newlines within any of the array elements. There are no checks
3188 #####################################################################
3189 package Perl::Tidy::IOScalarArray;
3193 my ( $package, $rarray, $mode ) = @_;
3194 my $ref = ref $rarray;
3195 if ( $ref ne 'ARRAY' ) {
3197 ------------------------------------------------------------------------
3198 expecting ref to ARRAY but got ref to ($ref); trace follows:
3199 ------------------------------------------------------------------------
3203 if ( $mode eq 'w' ) {
3205 return bless [ $rarray, $mode ], $package;
3207 elsif ( $mode eq 'r' ) {
3209 return bless [ $rarray, $mode, $i_next ], $package;
3213 ------------------------------------------------------------------------
3214 expecting mode = 'r' or 'w' but got mode ($mode); trace follows:
3215 ------------------------------------------------------------------------
3222 my $mode = $self->[1];
3223 if ( $mode ne 'r' ) {
3225 ------------------------------------------------------------------------
3226 getline requires mode = 'r' but mode = ($mode); trace follows:
3227 ------------------------------------------------------------------------
3230 my $i = $self->[2]++;
3231 ##my $line = $self->[0]->[$i];
3232 return $self->[0]->[$i];
3237 my $mode = $self->[1];
3238 if ( $mode ne 'w' ) {
3240 ------------------------------------------------------------------------
3241 print requires mode = 'w' but mode = ($mode); trace follows:
3242 ------------------------------------------------------------------------
3245 push @{ $self->[0] }, $_[0];
3247 sub close { return }
3249 #####################################################################
3251 # the Perl::Tidy::LineSource class supplies an object with a 'get_line()' method
3252 # which returns the next line to be parsed
3254 #####################################################################
3256 package Perl::Tidy::LineSource;
3260 my ( $class, $input_file, $rOpts, $rpending_logfile_message ) = @_;
3261 my $input_file_copy = undef;
3264 my $input_line_ending;
3265 if ( $rOpts->{'preserve-line-endings'} ) {
3266 $input_line_ending = Perl::Tidy::find_input_line_ending($input_file);
3269 ( my $fh, $input_file ) = Perl::Tidy::streamhandle( $input_file, 'r' );
3270 return undef unless $fh;
3272 # in order to check output syntax when standard output is used,
3273 # or when it is an object, we have to make a copy of the file
3274 if ( ( $input_file eq '-' || ref $input_file ) && $rOpts->{'check-syntax'} )
3277 # Turning off syntax check when input output is used.
3278 # The reason is that temporary files cause problems on
3280 $rOpts->{'check-syntax'} = 0;
3281 $input_file_copy = '-';
3283 $$rpending_logfile_message .= <<EOM;
3284 Note: --syntax check will be skipped because standard input is used
3291 _fh_copy => $fh_copy,
3292 _filename => $input_file,
3293 _input_file_copy => $input_file_copy,
3294 _input_line_ending => $input_line_ending,
3295 _rinput_buffer => [],
3300 sub get_input_file_copy_name {
3302 my $ifname = $self->{_input_file_copy};
3304 $ifname = $self->{_filename};
3309 sub close_input_file {
3311 eval { $self->{_fh}->close() };
3312 eval { $self->{_fh_copy}->close() } if $self->{_fh_copy};
3318 my $fh = $self->{_fh};
3319 my $fh_copy = $self->{_fh_copy};
3320 my $rinput_buffer = $self->{_rinput_buffer};
3322 if ( scalar(@$rinput_buffer) ) {
3323 $line = shift @$rinput_buffer;
3326 $line = $fh->getline();
3328 # patch to read raw mac files under unix, dos
3329 # see if the first line has embedded \r's
3330 if ( $line && !$self->{_started} ) {
3331 if ( $line =~ /[\015][^\015\012]/ ) {
3333 # found one -- break the line up and store in a buffer
3334 @$rinput_buffer = map { $_ . "\n" } split /\015/, $line;
3335 my $count = @$rinput_buffer;
3336 $line = shift @$rinput_buffer;
3338 $self->{_started}++;
3341 if ( $line && $fh_copy ) { $fh_copy->print($line); }
3348 my $fh = $self->{_fh};
3349 my $fh_copy = $self->{_fh_copy};
3350 $line = $fh->getline();
3351 if ( $line && $fh_copy ) { $fh_copy->print($line); }
3355 #####################################################################
3357 # the Perl::Tidy::LineSink class supplies a write_line method for
3358 # actual file writing
3360 #####################################################################
3362 package Perl::Tidy::LineSink;
3366 my ( $class, $output_file, $tee_file, $line_separator, $rOpts,
3367 $rpending_logfile_message, $binmode )
3370 my $fh_copy = undef;
3372 my $output_file_copy = "";
3373 my $output_file_open = 0;
3375 if ( $rOpts->{'format'} eq 'tidy' ) {
3376 ( $fh, $output_file ) = Perl::Tidy::streamhandle( $output_file, 'w' );
3377 unless ($fh) { die "Cannot write to output stream\n"; }
3378 $output_file_open = 1;
3380 if ( ref($fh) eq 'IO::File' ) {
3383 if ( $output_file eq '-' ) { binmode STDOUT }
3387 # in order to check output syntax when standard output is used,
3388 # or when it is an object, we have to make a copy of the file
3389 if ( $output_file eq '-' || ref $output_file ) {
3390 if ( $rOpts->{'check-syntax'} ) {
3392 # Turning off syntax check when standard output is used.
3393 # The reason is that temporary files cause problems on
3395 $rOpts->{'check-syntax'} = 0;
3396 $output_file_copy = '-';
3397 $$rpending_logfile_message .= <<EOM;
3398 Note: --syntax check will be skipped because standard output is used
3406 _fh_copy => $fh_copy,
3408 _output_file => $output_file,
3409 _output_file_open => $output_file_open,
3410 _output_file_copy => $output_file_copy,
3412 _tee_file => $tee_file,
3413 _tee_file_opened => 0,
3414 _line_separator => $line_separator,
3415 _binmode => $binmode,
3422 my $fh = $self->{_fh};
3423 my $fh_copy = $self->{_fh_copy};
3425 my $output_file_open = $self->{_output_file_open};
3427 $_[0] .= $self->{_line_separator};
3429 $fh->print( $_[0] ) if ( $self->{_output_file_open} );
3430 print $fh_copy $_[0] if ( $fh_copy && $self->{_output_file_copy} );
3432 if ( $self->{_tee_flag} ) {
3433 unless ( $self->{_tee_file_opened} ) { $self->really_open_tee_file() }
3434 my $fh_tee = $self->{_fh_tee};
3435 print $fh_tee $_[0];
3439 sub get_output_file_copy {
3441 my $ofname = $self->{_output_file_copy};
3443 $ofname = $self->{_output_file};
3450 $self->{_tee_flag} = 1;
3455 $self->{_tee_flag} = 0;
3458 sub really_open_tee_file {
3460 my $tee_file = $self->{_tee_file};
3462 $fh_tee = IO::File->new(">$tee_file")
3463 or die("couldn't open TEE file $tee_file: $!\n");
3464 binmode $fh_tee if $self->{_binmode};
3465 $self->{_tee_file_opened} = 1;
3466 $self->{_fh_tee} = $fh_tee;
3469 sub close_output_file {
3471 eval { $self->{_fh}->close() } if $self->{_output_file_open};
3472 eval { $self->{_fh_copy}->close() } if ( $self->{_output_file_copy} );
3473 $self->close_tee_file();
3476 sub close_tee_file {
3479 if ( $self->{_tee_file_opened} ) {
3480 eval { $self->{_fh_tee}->close() };
3481 $self->{_tee_file_opened} = 0;
3485 #####################################################################
3487 # The Perl::Tidy::Diagnostics class writes the DIAGNOSTICS file, which is
3488 # useful for program development.
3490 # Only one such file is created regardless of the number of input
3491 # files processed. This allows the results of processing many files
3492 # to be summarized in a single file.
3494 #####################################################################
3496 package Perl::Tidy::Diagnostics;
3502 _write_diagnostics_count => 0,
3503 _last_diagnostic_file => "",
3509 sub set_input_file {
3511 $self->{_input_file} = $_[0];
3514 # This is a diagnostic routine which is useful for program development.
3515 # Output from debug messages go to a file named DIAGNOSTICS, where
3516 # they are labeled by file and line. This allows many files to be
3517 # scanned at once for some particular condition of interest.
3518 sub write_diagnostics {
3521 unless ( $self->{_write_diagnostics_count} ) {
3522 open DIAGNOSTICS, ">DIAGNOSTICS"
3523 or death("couldn't open DIAGNOSTICS: $!\n");
3526 my $last_diagnostic_file = $self->{_last_diagnostic_file};
3527 my $input_file = $self->{_input_file};
3528 if ( $last_diagnostic_file ne $input_file ) {
3529 print DIAGNOSTICS "\nFILE:$input_file\n";
3531 $self->{_last_diagnostic_file} = $input_file;
3532 my $input_line_number = Perl::Tidy::Tokenizer::get_input_line_number();
3533 print DIAGNOSTICS "$input_line_number:\t@_";
3534 $self->{_write_diagnostics_count}++;
3537 #####################################################################
3539 # The Perl::Tidy::Logger class writes the .LOG and .ERR files
3541 #####################################################################
3543 package Perl::Tidy::Logger;
3548 my ( $rOpts, $log_file, $warning_file, $saw_extrude ) = @_;
3550 # remove any old error output file
3551 unless ( ref($warning_file) ) {
3552 if ( -e $warning_file ) { unlink($warning_file) }
3556 _log_file => $log_file,
3557 _fh_warnings => undef,
3559 _fh_warnings => undef,
3560 _last_input_line_written => 0,
3561 _at_end_of_file => 0,
3563 _block_log_output => 0,
3564 _line_of_tokens => undef,
3565 _output_line_number => undef,
3566 _wrote_line_information_string => 0,
3567 _wrote_column_headings => 0,
3568 _warning_file => $warning_file,
3569 _warning_count => 0,
3570 _complaint_count => 0,
3571 _saw_code_bug => -1, # -1=no 0=maybe 1=for sure
3572 _saw_brace_error => 0,
3573 _saw_extrude => $saw_extrude,
3574 _output_array => [],
3578 sub close_log_file {
3581 if ( $self->{_fh_warnings} ) {
3582 eval { $self->{_fh_warnings}->close() };
3583 $self->{_fh_warnings} = undef;
3587 sub get_warning_count {
3589 return $self->{_warning_count};
3592 sub get_use_prefix {
3594 return $self->{_use_prefix};
3597 sub block_log_output {
3599 $self->{_block_log_output} = 1;
3602 sub unblock_log_output {
3604 $self->{_block_log_output} = 0;
3607 sub interrupt_logfile {
3609 $self->{_use_prefix} = 0;
3610 $self->warning("\n");
3611 $self->write_logfile_entry( '#' x 24 . " WARNING " . '#' x 25 . "\n" );
3614 sub resume_logfile {
3616 $self->write_logfile_entry( '#' x 60 . "\n" );
3617 $self->{_use_prefix} = 1;
3620 sub we_are_at_the_last_line {
3622 unless ( $self->{_wrote_line_information_string} ) {
3623 $self->write_logfile_entry("Last line\n\n");
3625 $self->{_at_end_of_file} = 1;
3628 # record some stuff in case we go down in flames
3631 my ( $line_of_tokens, $output_line_number ) = @_;
3632 my $input_line = $line_of_tokens->{_line_text};
3633 my $input_line_number = $line_of_tokens->{_line_number};
3635 # save line information in case we have to write a logfile message
3636 $self->{_line_of_tokens} = $line_of_tokens;
3637 $self->{_output_line_number} = $output_line_number;
3638 $self->{_wrote_line_information_string} = 0;
3640 my $last_input_line_written = $self->{_last_input_line_written};
3641 my $rOpts = $self->{_rOpts};
3644 ( $input_line_number - $last_input_line_written ) >=
3645 $rOpts->{'logfile-gap'}
3647 || ( $input_line =~ /^\s*(sub|package)\s+(\w+)/ )
3650 my $rlevels = $line_of_tokens->{_rlevels};
3651 my $structural_indentation_level = $$rlevels[0];
3652 $self->{_last_input_line_written} = $input_line_number;
3653 ( my $out_str = $input_line ) =~ s/^\s*//;
3656 $out_str = ( '.' x $structural_indentation_level ) . $out_str;
3658 if ( length($out_str) > 35 ) {
3659 $out_str = substr( $out_str, 0, 35 ) . " ....";
3661 $self->logfile_output( "", "$out_str\n" );
3665 sub write_logfile_entry {
3668 # add leading >>> to avoid confusing error mesages and code
3669 $self->logfile_output( ">>>", "@_" );
3672 sub write_column_headings {
3675 $self->{_wrote_column_headings} = 1;
3676 my $routput_array = $self->{_output_array};
3677 push @{$routput_array}, <<EOM;
3678 The nesting depths in the table below are at the start of the lines.
3679 The indicated output line numbers are not always exact.
3680 ci = levels of continuation indentation; bk = 1 if in BLOCK, 0 if not.
3682 in:out indent c b nesting code + messages; (messages begin with >>>)
3683 lines levels i k (code begins with one '.' per indent level)
3684 ------ ----- - - -------- -------------------------------------------
3688 sub make_line_information_string {
3690 # make columns of information when a logfile message needs to go out
3692 my $line_of_tokens = $self->{_line_of_tokens};
3693 my $input_line_number = $line_of_tokens->{_line_number};
3694 my $line_information_string = "";
3695 if ($input_line_number) {
3697 my $output_line_number = $self->{_output_line_number};
3698 my $brace_depth = $line_of_tokens->{_curly_brace_depth};
3699 my $paren_depth = $line_of_tokens->{_paren_depth};
3700 my $square_bracket_depth = $line_of_tokens->{_square_bracket_depth};
3701 my $python_indentation_level =
3702 $line_of_tokens->{_python_indentation_level};
3703 my $rlevels = $line_of_tokens->{_rlevels};
3704 my $rnesting_tokens = $line_of_tokens->{_rnesting_tokens};
3705 my $rci_levels = $line_of_tokens->{_rci_levels};
3706 my $rnesting_blocks = $line_of_tokens->{_rnesting_blocks};
3708 my $structural_indentation_level = $$rlevels[0];
3710 $self->write_column_headings() unless $self->{_wrote_column_headings};
3712 # keep logfile columns aligned for scripts up to 999 lines;
3713 # for longer scripts it doesn't really matter
3714 my $extra_space = "";
3716 ( $input_line_number < 10 ) ? " "
3717 : ( $input_line_number < 100 ) ? " "
3720 ( $output_line_number < 10 ) ? " "
3721 : ( $output_line_number < 100 ) ? " "
3724 # there are 2 possible nesting strings:
3725 # the original which looks like this: (0 [1 {2
3726 # the new one, which looks like this: {{[
3727 # the new one is easier to read, and shows the order, but
3728 # could be arbitrarily long, so we use it unless it is too long
3729 my $nesting_string =
3730 "($paren_depth [$square_bracket_depth {$brace_depth";
3731 my $nesting_string_new = $$rnesting_tokens[0];
3733 my $ci_level = $$rci_levels[0];
3734 if ( $ci_level > 9 ) { $ci_level = '*' }
3735 my $bk = ( $$rnesting_blocks[0] =~ /1$/ ) ? '1' : '0';
3737 if ( length($nesting_string_new) <= 8 ) {
3739 $nesting_string_new . " " x ( 8 - length($nesting_string_new) );
3741 if ( $python_indentation_level < 0 ) { $python_indentation_level = 0 }
3742 $line_information_string =
3743 "L$input_line_number:$output_line_number$extra_space i$python_indentation_level:$structural_indentation_level $ci_level $bk $nesting_string";
3745 return $line_information_string;
3748 sub logfile_output {
3750 my ( $prompt, $msg ) = @_;
3751 return if ( $self->{_block_log_output} );
3753 my $routput_array = $self->{_output_array};
3754 if ( $self->{_at_end_of_file} || !$self->{_use_prefix} ) {
3755 push @{$routput_array}, "$msg";
3758 my $line_information_string = $self->make_line_information_string();
3759 $self->{_wrote_line_information_string} = 1;
3761 if ($line_information_string) {
3762 push @{$routput_array}, "$line_information_string $prompt$msg";
3765 push @{$routput_array}, "$msg";
3770 sub get_saw_brace_error {
3772 return $self->{_saw_brace_error};
3775 sub increment_brace_error {
3777 $self->{_saw_brace_error}++;
3782 use constant BRACE_WARNING_LIMIT => 10;
3783 my $saw_brace_error = $self->{_saw_brace_error};
3785 if ( $saw_brace_error < BRACE_WARNING_LIMIT ) {
3789 $self->{_saw_brace_error} = $saw_brace_error;
3791 if ( $saw_brace_error == BRACE_WARNING_LIMIT ) {
3792 $self->warning("No further warnings of this type will be given\n");
3798 # handle non-critical warning messages based on input flag
3800 my $rOpts = $self->{_rOpts};
3802 # these appear in .ERR output only if -w flag is used
3803 if ( $rOpts->{'warning-output'} ) {
3807 # otherwise, they go to the .LOG file
3809 $self->{_complaint_count}++;
3810 $self->write_logfile_entry(@_);
3816 # report errors to .ERR file (or stdout)
3818 use constant WARNING_LIMIT => 50;
3820 my $rOpts = $self->{_rOpts};
3821 unless ( $rOpts->{'quiet'} ) {
3823 my $warning_count = $self->{_warning_count};
3824 unless ($warning_count) {
3825 my $warning_file = $self->{_warning_file};
3827 if ( $rOpts->{'standard-error-output'} ) {
3828 $fh_warnings = *STDERR;
3831 ( $fh_warnings, my $filename ) =
3832 Perl::Tidy::streamhandle( $warning_file, 'w' );
3833 $fh_warnings or die("couldn't open $filename $!\n");
3834 warn "## Please see file $filename\n";
3836 $self->{_fh_warnings} = $fh_warnings;
3839 my $fh_warnings = $self->{_fh_warnings};
3840 if ( $warning_count < WARNING_LIMIT ) {
3841 if ( $self->get_use_prefix() > 0 ) {
3842 my $input_line_number =
3843 Perl::Tidy::Tokenizer::get_input_line_number();
3844 $fh_warnings->print("$input_line_number:\t@_");
3845 $self->write_logfile_entry("WARNING: @_");
3848 $fh_warnings->print(@_);
3849 $self->write_logfile_entry(@_);
3853 $self->{_warning_count} = $warning_count;
3855 if ( $warning_count == WARNING_LIMIT ) {
3856 $fh_warnings->print("No further warnings will be given\n");
3861 # programming bug codes:
3863 # 0 = maybe, not sure.
3865 sub report_possible_bug {
3867 my $saw_code_bug = $self->{_saw_code_bug};
3868 $self->{_saw_code_bug} = ( $saw_code_bug < 0 ) ? 0 : $saw_code_bug;
3871 sub report_definite_bug {
3873 $self->{_saw_code_bug} = 1;
3876 sub ask_user_for_bug_report {
3879 my ( $infile_syntax_ok, $formatter ) = @_;
3880 my $saw_code_bug = $self->{_saw_code_bug};
3881 if ( ( $saw_code_bug == 0 ) && ( $infile_syntax_ok == 1 ) ) {
3882 $self->warning(<<EOM);
3884 You may have encountered a code bug in perltidy. If you think so, and
3885 the problem is not listed in the BUGS file at
3886 http://perltidy.sourceforge.net, please report it so that it can be
3887 corrected. Include the smallest possible script which has the problem,
3888 along with the .LOG file. See the manual pages for contact information.
3893 elsif ( $saw_code_bug == 1 ) {
3894 if ( $self->{_saw_extrude} ) {
3895 $self->warning(<<EOM);
3897 You may have encountered a bug in perltidy. However, since you are using the
3898 -extrude option, the problem may be with perl or one of its modules, which have
3899 occasional problems with this type of file. If you believe that the
3900 problem is with perltidy, and the problem is not listed in the BUGS file at
3901 http://perltidy.sourceforge.net, please report it so that it can be corrected.
3902 Include the smallest possible script which has the problem, along with the .LOG
3903 file. See the manual pages for contact information.
3908 $self->warning(<<EOM);
3910 Oops, you seem to have encountered a bug in perltidy. Please check the
3911 BUGS file at http://perltidy.sourceforge.net. If the problem is not
3912 listed there, please report it so that it can be corrected. Include the
3913 smallest possible script which produces this message, along with the
3914 .LOG file if appropriate. See the manual pages for contact information.
3915 Your efforts are appreciated.
3918 my $added_semicolon_count = 0;
3920 $added_semicolon_count =
3921 $formatter->get_added_semicolon_count();
3923 if ( $added_semicolon_count > 0 ) {
3924 $self->warning(<<EOM);
3926 The log file shows that perltidy added $added_semicolon_count semicolons.
3927 Please rerun with -nasc to see if that is the cause of the syntax error. Even
3928 if that is the problem, please report it so that it can be fixed.
3938 # called after all formatting to summarize errors
3940 my ( $infile_syntax_ok, $formatter ) = @_;
3942 my $rOpts = $self->{_rOpts};
3943 my $warning_count = $self->{_warning_count};
3944 my $saw_code_bug = $self->{_saw_code_bug};
3946 my $save_logfile = ( $saw_code_bug == 0 && $infile_syntax_ok == 1 )
3947 || $saw_code_bug == 1
3948 || $rOpts->{'logfile'};
3949 my $log_file = $self->{_log_file};
3950 if ($warning_count) {
3951 if ($save_logfile) {
3952 $self->block_log_output(); # avoid echoing this to the logfile
3954 "The logfile $log_file may contain useful information\n");
3955 $self->unblock_log_output();
3958 if ( $self->{_complaint_count} > 0 ) {
3960 "To see $self->{_complaint_count} non-critical warnings rerun with -w\n"
3964 if ( $self->{_saw_brace_error}
3965 && ( $rOpts->{'logfile-gap'} > 1 || !$save_logfile ) )
3967 $self->warning("To save a full .LOG file rerun with -g\n");
3970 $self->ask_user_for_bug_report( $infile_syntax_ok, $formatter );
3972 if ($save_logfile) {
3973 my $log_file = $self->{_log_file};
3974 my ( $fh, $filename ) = Perl::Tidy::streamhandle( $log_file, 'w' );
3976 my $routput_array = $self->{_output_array};
3977 foreach ( @{$routput_array} ) { $fh->print($_) }
3978 eval { $fh->close() };
3983 #####################################################################
3985 # The Perl::Tidy::DevNull class supplies a dummy print method
3987 #####################################################################
3989 package Perl::Tidy::DevNull;
3990 sub new { return bless {}, $_[0] }
3991 sub print { return }
3992 sub close { return }
3994 #####################################################################
3996 # The Perl::Tidy::HtmlWriter class writes a copy of the input stream in html
3998 #####################################################################
4000 package Perl::Tidy::HtmlWriter;
4010 %short_to_long_names
4014 $missing_html_entities
4017 # replace unsafe characters with HTML entity representation if HTML::Entities
4019 { eval "use HTML::Entities"; $missing_html_entities = $@; }
4023 my ( $class, $input_file, $html_file, $extension, $html_toc_extension,
4024 $html_src_extension )
4027 my $html_file_opened = 0;
4029 ( $html_fh, my $html_filename ) =
4030 Perl::Tidy::streamhandle( $html_file, 'w' );
4032 warn("can't open $html_file: $!\n");
4035 $html_file_opened = 1;
4037 if ( !$input_file || $input_file eq '-' || ref($input_file) ) {
4038 $input_file = "NONAME";
4041 # write the table of contents to a string
4043 my $html_toc_fh = Perl::Tidy::IOScalar->new( \$toc_string, 'w' );
4046 my @pre_string_stack;
4047 if ( $rOpts->{'html-pre-only'} ) {
4049 # pre section goes directly to the output stream
4050 $html_pre_fh = $html_fh;
4051 $html_pre_fh->print( <<"PRE_END");
4057 # pre section go out to a temporary string
4059 $html_pre_fh = Perl::Tidy::IOScalar->new( \$pre_string, 'w' );
4060 push @pre_string_stack, \$pre_string;
4063 # pod text gets diverted if the 'pod2html' is used
4066 if ( $rOpts->{'pod2html'} ) {
4067 if ( $rOpts->{'html-pre-only'} ) {
4068 undef $rOpts->{'pod2html'};
4071 eval "use Pod::Html";
4074 "unable to find Pod::Html; cannot use pod2html\n-npod disables this message\n";
4075 undef $rOpts->{'pod2html'};
4078 $html_pod_fh = Perl::Tidy::IOScalar->new( \$pod_string, 'w' );
4085 if ( $rOpts->{'frames'} ) {
4086 unless ($extension) {
4088 "cannot use frames without a specified output extension; ignoring -frm\n";
4089 undef $rOpts->{'frames'};
4092 $toc_filename = $input_file . $html_toc_extension . $extension;
4093 $src_filename = $input_file . $html_src_extension . $extension;
4097 # ----------------------------------------------------------
4098 # Output is now directed as follows:
4099 # html_toc_fh <-- table of contents items
4100 # html_pre_fh <-- the <pre> section of formatted code, except:
4101 # html_pod_fh <-- pod goes here with the pod2html option
4102 # ----------------------------------------------------------
4104 my $title = $rOpts->{'title'};
4106 ( $title, my $path ) = fileparse($input_file);
4108 my $toc_item_count = 0;
4109 my $in_toc_package = "";
4112 _input_file => $input_file, # name of input file
4113 _title => $title, # title, unescaped
4114 _html_file => $html_file, # name of .html output file
4115 _toc_filename => $toc_filename, # for frames option
4116 _src_filename => $src_filename, # for frames option
4117 _html_file_opened => $html_file_opened, # a flag
4118 _html_fh => $html_fh, # the output stream
4119 _html_pre_fh => $html_pre_fh, # pre section goes here
4120 _rpre_string_stack => \@pre_string_stack, # stack of pre sections
4121 _html_pod_fh => $html_pod_fh, # pod goes here if pod2html
4122 _rpod_string => \$pod_string, # string holding pod
4123 _pod_cut_count => 0, # how many =cut's?
4124 _html_toc_fh => $html_toc_fh, # fh for table of contents
4125 _rtoc_string => \$toc_string, # string holding toc
4126 _rtoc_item_count => \$toc_item_count, # how many toc items
4127 _rin_toc_package => \$in_toc_package, # package name
4128 _rtoc_name_count => {}, # hash to track unique names
4129 _rpackage_stack => [], # stack to check for package
4131 _rlast_level => \$last_level, # brace indentation level
4137 # Add an item to the html table of contents.
4138 # This is called even if no table of contents is written,
4139 # because we still want to put the anchors in the <pre> text.
4140 # We are given an anchor name and its type; types are:
4141 # 'package', 'sub', '__END__', '__DATA__', 'EOF'
4142 # There must be an 'EOF' call at the end to wrap things up.
4144 my ( $name, $type ) = @_;
4145 my $html_toc_fh = $self->{_html_toc_fh};
4146 my $html_pre_fh = $self->{_html_pre_fh};
4147 my $rtoc_name_count = $self->{_rtoc_name_count};
4148 my $rtoc_item_count = $self->{_rtoc_item_count};
4149 my $rlast_level = $self->{_rlast_level};
4150 my $rin_toc_package = $self->{_rin_toc_package};
4151 my $rpackage_stack = $self->{_rpackage_stack};
4153 # packages contain sublists of subs, so to avoid errors all package
4154 # items are written and finished with the following routines
4155 my $end_package_list = sub {
4156 if ($$rin_toc_package) {
4157 $html_toc_fh->print("</ul>\n</li>\n");
4158 $$rin_toc_package = "";
4162 my $start_package_list = sub {
4163 my ( $unique_name, $package ) = @_;
4164 if ($$rin_toc_package) { $end_package_list->() }
4165 $html_toc_fh->print(<<EOM);
4166 <li><a href=\"#$unique_name\">package $package</a>
4169 $$rin_toc_package = $package;
4172 # start the table of contents on the first item
4173 unless ($$rtoc_item_count) {
4175 # but just quit if we hit EOF without any other entries
4176 # in this case, there will be no toc
4177 return if ( $type eq 'EOF' );
4178 $html_toc_fh->print( <<"TOC_END");
4179 <!-- BEGIN CODE INDEX --><a name="code-index"></a>
4183 $$rtoc_item_count++;
4185 # make a unique anchor name for this location:
4186 # - packages get a 'package-' prefix
4187 # - subs use their names
4188 my $unique_name = $name;
4189 if ( $type eq 'package' ) { $unique_name = "package-$name" }
4191 # append '-1', '-2', etc if necessary to make unique; this will
4192 # be unique because subs and packages cannot have a '-'
4193 if ( my $count = $rtoc_name_count->{ lc $unique_name }++ ) {
4194 $unique_name .= "-$count";
4197 # - all names get terminal '-' if pod2html is used, to avoid
4198 # conflicts with anchor names created by pod2html
4199 if ( $rOpts->{'pod2html'} ) { $unique_name .= '-' }
4201 # start/stop lists of subs
4202 if ( $type eq 'sub' ) {
4203 my $package = $rpackage_stack->[$$rlast_level];
4204 unless ($package) { $package = 'main' }
4206 # if we're already in a package/sub list, be sure its the right
4207 # package or else close it
4208 if ( $$rin_toc_package && $$rin_toc_package ne $package ) {
4209 $end_package_list->();
4212 # start a package/sub list if necessary
4213 unless ($$rin_toc_package) {
4214 $start_package_list->( $unique_name, $package );
4218 # now write an entry in the toc for this item
4219 if ( $type eq 'package' ) {
4220 $start_package_list->( $unique_name, $name );
4222 elsif ( $type eq 'sub' ) {
4223 $html_toc_fh->print("<li><a href=\"#$unique_name\">$name</a></li>\n");
4226 $end_package_list->();
4227 $html_toc_fh->print("<li><a href=\"#$unique_name\">$name</a></li>\n");
4230 # write the anchor in the <pre> section
4231 $html_pre_fh->print("<a name=\"$unique_name\"></a>");
4233 # end the table of contents, if any, on the end of file
4234 if ( $type eq 'EOF' ) {
4235 $html_toc_fh->print( <<"TOC_END");
4237 <!-- END CODE INDEX -->
4244 # This is the official list of tokens which may be identified by the
4245 # user. Long names are used as getopt keys. Short names are
4246 # convenient short abbreviations for specifying input. Short names
4247 # somewhat resemble token type characters, but are often different
4248 # because they may only be alphanumeric, to allow command line
4249 # input. Also, note that because of case insensitivity of html,
4250 # this table must be in a single case only (I've chosen to use all
4252 # When adding NEW_TOKENS: update this hash table
4253 # short names => long names
4254 %short_to_long_names = (
4264 'pu' => 'punctuation',
4265 'i' => 'identifier',
4267 'h' => 'here-doc-target',
4268 'hh' => 'here-doc-text',
4270 'sc' => 'semicolon',
4271 'm' => 'subroutine',
4275 # Now we have to map actual token types into one of the above short
4276 # names; any token types not mapped will get 'punctuation'
4279 # The values of this hash table correspond to the keys of the
4280 # previous hash table.
4281 # The keys of this hash table are token types and can be seen
4282 # by running with --dump-token-types (-dtt).
4284 # When adding NEW_TOKENS: update this hash table
4285 # $type => $short_name
4286 %token_short_names = (
4311 # These token types will all be called identifiers for now
4312 # FIXME: need to separate user defined modules as separate type
4313 my @identifier = qw" i t U C Y Z G :: ";
4314 @token_short_names{@identifier} = ('i') x scalar(@identifier);
4316 # These token types will be called 'structure'
4317 my @structure = qw" { } ";
4318 @token_short_names{@structure} = ('s') x scalar(@structure);
4320 # OLD NOTES: save for reference
4321 # Any of these could be added later if it would be useful.
4322 # For now, they will by default become punctuation
4323 # my @list = qw" L R [ ] ";
4324 # @token_long_names{@list} = ('non-structure') x scalar(@list);
4327 # / /= * *= ** **= + += - -= % %= = ++ -- << <<= >> >>= pp p m mm
4329 # @token_long_names{@list} = ('math') x scalar(@list);
4331 # my @list = qw" & &= ~ ~= ^ ^= | |= ";
4332 # @token_long_names{@list} = ('bit') x scalar(@list);
4334 # my @list = qw" == != < > <= <=> ";
4335 # @token_long_names{@list} = ('numerical-comparison') x scalar(@list);
4337 # my @list = qw" && || ! &&= ||= //= ";
4338 # @token_long_names{@list} = ('logical') x scalar(@list);
4340 # my @list = qw" . .= =~ !~ x x= ";
4341 # @token_long_names{@list} = ('string-operators') x scalar(@list);
4344 # my @list = qw" .. -> <> ... \ ? ";
4345 # @token_long_names{@list} = ('misc-operators') x scalar(@list);
4349 sub make_getopt_long_names {
4351 my ($rgetopt_names) = @_;
4352 while ( my ( $short_name, $name ) = each %short_to_long_names ) {
4353 push @$rgetopt_names, "html-color-$name=s";
4354 push @$rgetopt_names, "html-italic-$name!";
4355 push @$rgetopt_names, "html-bold-$name!";
4357 push @$rgetopt_names, "html-color-background=s";
4358 push @$rgetopt_names, "html-linked-style-sheet=s";
4359 push @$rgetopt_names, "nohtml-style-sheets";
4360 push @$rgetopt_names, "html-pre-only";
4361 push @$rgetopt_names, "html-line-numbers";
4362 push @$rgetopt_names, "html-entities!";
4363 push @$rgetopt_names, "stylesheet";
4364 push @$rgetopt_names, "html-table-of-contents!";
4365 push @$rgetopt_names, "pod2html!";
4366 push @$rgetopt_names, "frames!";
4367 push @$rgetopt_names, "html-toc-extension=s";
4368 push @$rgetopt_names, "html-src-extension=s";
4370 # Pod::Html parameters:
4371 push @$rgetopt_names, "backlink=s";
4372 push @$rgetopt_names, "cachedir=s";
4373 push @$rgetopt_names, "htmlroot=s";
4374 push @$rgetopt_names, "libpods=s";
4375 push @$rgetopt_names, "podpath=s";
4376 push @$rgetopt_names, "podroot=s";
4377 push @$rgetopt_names, "title=s";
4379 # Pod::Html parameters with leading 'pod' which will be removed
4380 # before the call to Pod::Html
4381 push @$rgetopt_names, "podquiet!";
4382 push @$rgetopt_names, "podverbose!";
4383 push @$rgetopt_names, "podrecurse!";
4384 push @$rgetopt_names, "podflush";
4385 push @$rgetopt_names, "podheader!";
4386 push @$rgetopt_names, "podindex!";
4389 sub make_abbreviated_names {
4391 # We're appending things like this to the expansion list:
4392 # 'hcc' => [qw(html-color-comment)],
4393 # 'hck' => [qw(html-color-keyword)],
4396 my ($rexpansion) = @_;
4398 # abbreviations for color/bold/italic properties
4399 while ( my ( $short_name, $long_name ) = each %short_to_long_names ) {
4400 ${$rexpansion}{"hc$short_name"} = ["html-color-$long_name"];
4401 ${$rexpansion}{"hb$short_name"} = ["html-bold-$long_name"];
4402 ${$rexpansion}{"hi$short_name"} = ["html-italic-$long_name"];
4403 ${$rexpansion}{"nhb$short_name"} = ["nohtml-bold-$long_name"];
4404 ${$rexpansion}{"nhi$short_name"} = ["nohtml-italic-$long_name"];
4407 # abbreviations for all other html options
4408 ${$rexpansion}{"hcbg"} = ["html-color-background"];
4409 ${$rexpansion}{"pre"} = ["html-pre-only"];
4410 ${$rexpansion}{"toc"} = ["html-table-of-contents"];
4411 ${$rexpansion}{"ntoc"} = ["nohtml-table-of-contents"];
4412 ${$rexpansion}{"nnn"} = ["html-line-numbers"];
4413 ${$rexpansion}{"hent"} = ["html-entities"];
4414 ${$rexpansion}{"nhent"} = ["nohtml-entities"];
4415 ${$rexpansion}{"css"} = ["html-linked-style-sheet"];
4416 ${$rexpansion}{"nss"} = ["nohtml-style-sheets"];
4417 ${$rexpansion}{"ss"} = ["stylesheet"];
4418 ${$rexpansion}{"pod"} = ["pod2html"];
4419 ${$rexpansion}{"npod"} = ["nopod2html"];
4420 ${$rexpansion}{"frm"} = ["frames"];
4421 ${$rexpansion}{"nfrm"} = ["noframes"];
4422 ${$rexpansion}{"text"} = ["html-toc-extension"];
4423 ${$rexpansion}{"sext"} = ["html-src-extension"];
4428 # This will be called once after options have been parsed
4432 # X11 color names for default settings that seemed to look ok
4433 # (these color names are only used for programming clarity; the hex
4434 # numbers are actually written)
4435 use constant ForestGreen => "#228B22";
4436 use constant SaddleBrown => "#8B4513";
4437 use constant magenta4 => "#8B008B";
4438 use constant IndianRed3 => "#CD5555";
4439 use constant DeepSkyBlue4 => "#00688B";
4440 use constant MediumOrchid3 => "#B452CD";
4441 use constant black => "#000000";
4442 use constant white => "#FFFFFF";
4443 use constant red => "#FF0000";
4445 # set default color, bold, italic properties
4446 # anything not listed here will be given the default (punctuation) color --
4447 # these types currently not listed and get default: ws pu s sc cm co p
4448 # When adding NEW_TOKENS: add an entry here if you don't want defaults
4450 # set_default_properties( $short_name, default_color, bold?, italic? );
4451 set_default_properties( 'c', ForestGreen, 0, 0 );
4452 set_default_properties( 'pd', ForestGreen, 0, 1 );
4453 set_default_properties( 'k', magenta4, 1, 0 ); # was SaddleBrown
4454 set_default_properties( 'q', IndianRed3, 0, 0 );
4455 set_default_properties( 'hh', IndianRed3, 0, 1 );
4456 set_default_properties( 'h', IndianRed3, 1, 0 );
4457 set_default_properties( 'i', DeepSkyBlue4, 0, 0 );
4458 set_default_properties( 'w', black, 0, 0 );
4459 set_default_properties( 'n', MediumOrchid3, 0, 0 );
4460 set_default_properties( 'v', MediumOrchid3, 0, 0 );
4461 set_default_properties( 'j', IndianRed3, 1, 0 );
4462 set_default_properties( 'm', red, 1, 0 );
4464 set_default_color( 'html-color-background', white );
4465 set_default_color( 'html-color-punctuation', black );
4467 # setup property lookup tables for tokens based on their short names
4468 # every token type has a short name, and will use these tables
4469 # to do the html markup
4470 while ( my ( $short_name, $long_name ) = each %short_to_long_names ) {
4471 $html_color{$short_name} = $rOpts->{"html-color-$long_name"};
4472 $html_bold{$short_name} = $rOpts->{"html-bold-$long_name"};
4473 $html_italic{$short_name} = $rOpts->{"html-italic-$long_name"};
4476 # write style sheet to STDOUT and die if requested
4477 if ( defined( $rOpts->{'stylesheet'} ) ) {
4478 write_style_sheet_file('-');
4482 # make sure user gives a file name after -css
4483 if ( defined( $rOpts->{'html-linked-style-sheet'} ) ) {
4484 $css_linkname = $rOpts->{'html-linked-style-sheet'};
4485 if ( $css_linkname =~ /^-/ ) {
4486 die "You must specify a valid filename after -css\n";
4490 # check for conflict
4491 if ( $css_linkname && $rOpts->{'nohtml-style-sheets'} ) {
4492 $rOpts->{'nohtml-style-sheets'} = 0;
4493 warning("You can't specify both -css and -nss; -nss ignored\n");
4496 # write a style sheet file if necessary
4497 if ($css_linkname) {
4499 # if the selected filename exists, don't write, because user may
4500 # have done some work by hand to create it; use backup name instead
4501 # Also, this will avoid a potential disaster in which the user
4502 # forgets to specify the style sheet, like this:
4503 # perltidy -html -css myfile1.pl myfile2.pl
4504 # This would cause myfile1.pl to parsed as the style sheet by GetOpts
4505 my $css_filename = $css_linkname;
4506 unless ( -e $css_filename ) {
4507 write_style_sheet_file($css_filename);
4510 $missing_html_entities = 1 unless $rOpts->{'html-entities'};
4513 sub write_style_sheet_file {
4515 my $css_filename = shift;
4517 unless ( $fh = IO::File->new("> $css_filename") ) {
4518 die "can't open $css_filename: $!\n";
4520 write_style_sheet_data($fh);
4521 eval { $fh->close };
4524 sub write_style_sheet_data {
4526 # write the style sheet data to an open file handle
4529 my $bg_color = $rOpts->{'html-color-background'};
4530 my $text_color = $rOpts->{'html-color-punctuation'};
4532 # pre-bgcolor is new, and may not be defined
4533 my $pre_bg_color = $rOpts->{'html-pre-color-background'};
4534 $pre_bg_color = $bg_color unless $pre_bg_color;
4536 $fh->print(<<"EOM");
4537 /* default style sheet generated by perltidy */
4538 body {background: $bg_color; color: $text_color}
4539 pre { color: $text_color;
4540 background: $pre_bg_color;
4541 font-family: courier;
4546 foreach my $short_name ( sort keys %short_to_long_names ) {
4547 my $long_name = $short_to_long_names{$short_name};
4549 my $abbrev = '.' . $short_name;
4550 if ( length($short_name) == 1 ) { $abbrev .= ' ' } # for alignment
4551 my $color = $html_color{$short_name};
4552 if ( !defined($color) ) { $color = $text_color }
4553 $fh->print("$abbrev \{ color: $color;");
4555 if ( $html_bold{$short_name} ) {
4556 $fh->print(" font-weight:bold;");
4559 if ( $html_italic{$short_name} ) {
4560 $fh->print(" font-style:italic;");
4562 $fh->print("} /* $long_name */\n");
4566 sub set_default_color {
4568 # make sure that options hash $rOpts->{$key} contains a valid color
4569 my ( $key, $color ) = @_;
4570 if ( $rOpts->{$key} ) { $color = $rOpts->{$key} }
4571 $rOpts->{$key} = check_RGB($color);
4576 # if color is a 6 digit hex RGB value, prepend a #, otherwise
4577 # assume that it is a valid ascii color name
4579 if ( $color =~ /^[0-9a-fA-F]{6,6}$/ ) { $color = "#$color" }
4583 sub set_default_properties {
4584 my ( $short_name, $color, $bold, $italic ) = @_;
4586 set_default_color( "html-color-$short_to_long_names{$short_name}", $color );
4588 $key = "html-bold-$short_to_long_names{$short_name}";
4589 $rOpts->{$key} = ( defined $rOpts->{$key} ) ? $rOpts->{$key} : $bold;
4590 $key = "html-italic-$short_to_long_names{$short_name}";
4591 $rOpts->{$key} = ( defined $rOpts->{$key} ) ? $rOpts->{$key} : $italic;
4596 # Use Pod::Html to process the pod and make the page
4597 # then merge the perltidy code sections into it.
4598 # return 1 if success, 0 otherwise
4600 my ( $pod_string, $css_string, $toc_string, $rpre_string_stack ) = @_;
4601 my $input_file = $self->{_input_file};
4602 my $title = $self->{_title};
4603 my $success_flag = 0;
4605 # don't try to use pod2html if no pod
4606 unless ($pod_string) {
4607 return $success_flag;
4610 # Pod::Html requires a real temporary filename
4611 # If we are making a frame, we have a name available
4612 # Otherwise, we have to fine one
4614 if ( $rOpts->{'frames'} ) {
4615 $tmpfile = $self->{_toc_filename};
4618 $tmpfile = Perl::Tidy::make_temporary_filename();
4620 my $fh_tmp = IO::File->new( $tmpfile, 'w' );
4622 warn "unable to open temporary file $tmpfile; cannot use pod2html\n";
4623 return $success_flag;
4626 #------------------------------------------------------------------
4627 # Warning: a temporary file is open; we have to clean up if
4628 # things go bad. From here on all returns should be by going to
4629 # RETURN so that the temporary file gets unlinked.
4630 #------------------------------------------------------------------
4632 # write the pod text to the temporary file
4633 $fh_tmp->print($pod_string);
4636 # Hand off the pod to pod2html.
4637 # Note that we can use the same temporary filename for input and output
4638 # because of the way pod2html works.
4642 push @args, "--infile=$tmpfile", "--outfile=$tmpfile", "--title=$title";
4645 # Flags with string args:
4646 # "backlink=s", "cachedir=s", "htmlroot=s", "libpods=s",
4647 # "podpath=s", "podroot=s"
4648 # Note: -css=s is handled by perltidy itself
4649 foreach $kw (qw(backlink cachedir htmlroot libpods podpath podroot)) {
4650 if ( $rOpts->{$kw} ) { push @args, "--$kw=$rOpts->{$kw}" }
4653 # Toggle switches; these have extra leading 'pod'
4654 # "header!", "index!", "recurse!", "quiet!", "verbose!"
4655 foreach $kw (qw(podheader podindex podrecurse podquiet podverbose)) {
4656 my $kwd = $kw; # allows us to strip 'pod'
4657 if ( $rOpts->{$kw} ) { $kwd =~ s/^pod//; push @args, "--$kwd" }
4658 elsif ( defined( $rOpts->{$kw} ) ) {
4660 push @args, "--no$kwd";
4666 if ( $rOpts->{$kw} ) { $kw =~ s/^pod//; push @args, "--$kw" }
4668 # Must clean up if pod2html dies (it can);
4669 # Be careful not to overwrite callers __DIE__ routine
4670 local $SIG{__DIE__} = sub {
4672 unlink $tmpfile if -e $tmpfile;
4678 $fh_tmp = IO::File->new( $tmpfile, 'r' );
4681 # this error shouldn't happen ... we just used this filename
4682 warn "unable to open temporary file $tmpfile; cannot use pod2html\n";
4686 my $html_fh = $self->{_html_fh};
4691 # This routine will write the html selectively and store the toc
4692 my $html_print = sub {
4694 $html_fh->print($_) unless ($no_print);
4695 if ($in_toc) { push @toc, $_ }
4699 # loop over lines of html output from pod2html and merge in
4700 # the necessary perltidy html sections
4701 my ( $saw_body, $saw_index, $saw_body_end );
4702 while ( my $line = $fh_tmp->getline() ) {
4704 if ( $line =~ /^\s*<html>\s*$/i ) {
4705 my $date = localtime;
4706 $html_print->("<!-- Generated by perltidy on $date -->\n");
4707 $html_print->($line);
4710 # Copy the perltidy css, if any, after <body> tag
4711 elsif ( $line =~ /^\s*<body.*>\s*$/i ) {
4713 $html_print->($css_string) if $css_string;
4714 $html_print->($line);
4716 # add a top anchor and heading
4717 $html_print->("<a name=\"-top-\"></a>\n");
4718 $title = escape_html($title);
4719 $html_print->("<h1>$title</h1>\n");
4721 elsif ( $line =~ /^\s*<!-- INDEX BEGIN -->\s*$/i ) {
4724 # when frames are used, an extra table of contents in the
4725 # contents panel is confusing, so don't print it
4726 $no_print = $rOpts->{'frames'}
4727 || !$rOpts->{'html-table-of-contents'};
4728 $html_print->("<h2>Doc Index:</h2>\n") if $rOpts->{'frames'};
4729 $html_print->($line);
4732 # Copy the perltidy toc, if any, after the Pod::Html toc
4733 elsif ( $line =~ /^\s*<!-- INDEX END -->\s*$/i ) {
4735 $html_print->($line);
4737 $html_print->("<hr />\n") if $rOpts->{'frames'};
4738 $html_print->("<h2>Code Index:</h2>\n");
4739 my @toc = map { $_ .= "\n" } split /\n/, $toc_string;
4740 $html_print->(@toc);
4746 # Copy one perltidy section after each marker
4747 elsif ( $line =~ /^(.*)<!-- pERLTIDY sECTION -->(.*)$/ ) {
4749 $html_print->($1) if $1;
4751 # Intermingle code and pod sections if we saw multiple =cut's.
4752 if ( $self->{_pod_cut_count} > 1 ) {
4753 my $rpre_string = shift(@$rpre_string_stack);
4754 if ($$rpre_string) {
4755 $html_print->('<pre>');
4756 $html_print->($$rpre_string);
4757 $html_print->('</pre>');
4761 # shouldn't happen: we stored a string before writing
4764 "Problem merging html stream with pod2html; order may be wrong\n";
4766 $html_print->($line);
4769 # If didn't see multiple =cut lines, we'll put the pod out first
4770 # and then the code, because it's less confusing.
4773 # since we are not intermixing code and pod, we don't need
4774 # or want any <hr> lines which separated pod and code
4775 $html_print->($line) unless ( $line =~ /^\s*<hr>\s*$/i );
4779 # Copy any remaining code section before the </body> tag
4780 elsif ( $line =~ /^\s*<\/body>\s*$/i ) {
4782 if (@$rpre_string_stack) {
4783 unless ( $self->{_pod_cut_count} > 1 ) {
4784 $html_print->('<hr />');
4786 while ( my $rpre_string = shift(@$rpre_string_stack) ) {
4787 $html_print->('<pre>');
4788 $html_print->($$rpre_string);
4789 $html_print->('</pre>');
4792 $html_print->($line);
4795 $html_print->($line);
4800 unless ($saw_body) {
4801 warn "Did not see <body> in pod2html output\n";
4804 unless ($saw_body_end) {
4805 warn "Did not see </body> in pod2html output\n";
4808 unless ($saw_index) {
4809 warn "Did not find INDEX END in pod2html output\n";
4814 eval { $html_fh->close() };
4816 # note that we have to unlink tmpfile before making frames
4817 # because the tmpfile may be one of the names used for frames
4818 unlink $tmpfile if -e $tmpfile;
4819 if ( $success_flag && $rOpts->{'frames'} ) {
4820 $self->make_frame( \@toc );
4822 return $success_flag;
4827 # Make a frame with table of contents in the left panel
4828 # and the text in the right panel.
4830 # $html_filename contains the no-frames html output
4831 # $rtoc is a reference to an array with the table of contents
4834 my $input_file = $self->{_input_file};
4835 my $html_filename = $self->{_html_file};
4836 my $toc_filename = $self->{_toc_filename};
4837 my $src_filename = $self->{_src_filename};
4838 my $title = $self->{_title};
4839 $title = escape_html($title);
4841 # FUTURE input parameter:
4842 my $top_basename = "";
4844 # We need to produce 3 html files:
4845 # 1. - the table of contents
4846 # 2. - the contents (source code) itself
4847 # 3. - the frame which contains them
4849 # get basenames for relative links
4850 my ( $toc_basename, $toc_path ) = fileparse($toc_filename);
4851 my ( $src_basename, $src_path ) = fileparse($src_filename);
4853 # 1. Make the table of contents panel, with appropriate changes
4854 # to the anchor names
4855 my $src_frame_name = 'SRC';
4857 write_toc_html( $title, $toc_filename, $src_basename, $rtoc,
4860 # 2. The current .html filename is renamed to be the contents panel
4861 rename( $html_filename, $src_filename )
4862 or die "Cannot rename $html_filename to $src_filename:$!\n";
4864 # 3. Then use the original html filename for the frame
4866 $title, $html_filename, $top_basename,
4867 $toc_basename, $src_basename, $src_frame_name
4871 sub write_toc_html {
4873 # write a separate html table of contents file for frames
4874 my ( $title, $toc_filename, $src_basename, $rtoc, $src_frame_name ) = @_;
4875 my $fh = IO::File->new( $toc_filename, 'w' )
4876 or die "Cannot open $toc_filename:$!\n";
4880 <title>$title</title>
4883 <h1><a href=\"$src_basename#-top-" target="$src_frame_name">$title</a></h1>
4887 change_anchor_names( $rtoc, $src_basename, "$src_frame_name" );
4888 $fh->print( join "", @$rtoc );
4897 sub write_frame_html {
4899 # write an html file to be the table of contents frame
4901 $title, $frame_filename, $top_basename,
4902 $toc_basename, $src_basename, $src_frame_name
4905 my $fh = IO::File->new( $frame_filename, 'w' )
4906 or die "Cannot open $toc_basename:$!\n";
4909 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
4910 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
4911 <?xml version="1.0" encoding="iso-8859-1" ?>
4912 <html xmlns="http://www.w3.org/1999/xhtml">
4914 <title>$title</title>
4918 # two left panels, one right, if master index file
4919 if ($top_basename) {
4921 <frameset cols="20%,80%">
4922 <frameset rows="30%,70%">
4923 <frame src = "$top_basename" />
4924 <frame src = "$toc_basename" />
4929 # one left panels, one right, if no master index file
4932 <frameset cols="20%,*">
4933 <frame src = "$toc_basename" />
4937 <frame src = "$src_basename" name = "$src_frame_name" />
4940 <p>If you see this message, you are using a non-frame-capable web client.</p>
4941 <p>This document contains:</p>
4943 <li><a href="$toc_basename">A table of contents</a></li>
4944 <li><a href="$src_basename">The source code</a></li>
4953 sub change_anchor_names {
4955 # add a filename and target to anchors
4956 # also return the first anchor
4957 my ( $rlines, $filename, $target ) = @_;
4959 foreach my $line (@$rlines) {
4961 # We're looking for lines like this:
4962 # <LI><A HREF="#synopsis">SYNOPSIS</A></LI>
4963 # ---- - -------- -----------------
4965 if ( $line =~ /^(.*)<a(.*)href\s*=\s*"([^#]*)#([^"]+)"[^>]*>(.*)$/i ) {
4969 my $href = "$filename#$name";
4970 $line = "$pre<a href=\"$href\" target=\"$target\">$post\n";
4971 unless ($first_anchor) { $first_anchor = $href }
4974 return $first_anchor;
4977 sub close_html_file {
4979 return unless $self->{_html_file_opened};
4981 my $html_fh = $self->{_html_fh};
4982 my $rtoc_string = $self->{_rtoc_string};
4984 # There are 3 basic paths to html output...
4986 # ---------------------------------
4987 # Path 1: finish up if in -pre mode
4988 # ---------------------------------
4989 if ( $rOpts->{'html-pre-only'} ) {
4990 $html_fh->print( <<"PRE_END");
4993 eval { $html_fh->close() };
4998 $self->add_toc_item( 'EOF', 'EOF' );
5000 my $rpre_string_stack = $self->{_rpre_string_stack};
5002 # Patch to darken the <pre> background color in case of pod2html and
5003 # interleaved code/documentation. Otherwise, the distinction
5004 # between code and documentation is blurred.
5005 if ( $rOpts->{pod2html}
5006 && $self->{_pod_cut_count} >= 1
5007 && $rOpts->{'html-color-background'} eq '#FFFFFF' )
5009 $rOpts->{'html-pre-color-background'} = '#F0F0F0';
5012 # put the css or its link into a string, if used
5014 my $fh_css = Perl::Tidy::IOScalar->new( \$css_string, 'w' );
5016 # use css linked to another file
5017 if ( $rOpts->{'html-linked-style-sheet'} ) {
5019 qq(<link rel="stylesheet" href="$css_linkname" type="text/css" />)
5023 # use css embedded in this file
5024 elsif ( !$rOpts->{'nohtml-style-sheets'} ) {
5025 $fh_css->print( <<'ENDCSS');
5026 <style type="text/css">
5029 write_style_sheet_data($fh_css);
5030 $fh_css->print( <<"ENDCSS");
5036 # -----------------------------------------------------------
5037 # path 2: use pod2html if requested
5038 # If we fail for some reason, continue on to path 3
5039 # -----------------------------------------------------------
5040 if ( $rOpts->{'pod2html'} ) {
5041 my $rpod_string = $self->{_rpod_string};
5042 $self->pod_to_html( $$rpod_string, $css_string, $$rtoc_string,
5043 $rpre_string_stack )
5047 # --------------------------------------------------
5048 # path 3: write code in html, with pod only in italics
5049 # --------------------------------------------------
5050 my $input_file = $self->{_input_file};
5051 my $title = escape_html($input_file);
5052 my $date = localtime;
5053 $html_fh->print( <<"HTML_START");
5054 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
5055 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
5056 <!-- Generated by perltidy on $date -->
5057 <html xmlns="http://www.w3.org/1999/xhtml">
5059 <title>$title</title>
5062 # output the css, if used
5064 $html_fh->print($css_string);
5065 $html_fh->print( <<"ENDCSS");
5072 $html_fh->print( <<"HTML_START");
5074 <body bgcolor=\"$rOpts->{'html-color-background'}\" text=\"$rOpts->{'html-color-punctuation'}\">
5078 $html_fh->print("<a name=\"-top-\"></a>\n");
5079 $html_fh->print( <<"EOM");
5083 # copy the table of contents
5085 && !$rOpts->{'frames'}
5086 && $rOpts->{'html-table-of-contents'} )
5088 $html_fh->print($$rtoc_string);
5091 # copy the pre section(s)
5092 my $fname_comment = $input_file;
5093 $fname_comment =~ s/--+/-/g; # protect HTML comment tags
5094 $html_fh->print( <<"END_PRE");
5096 <!-- contents of filename: $fname_comment -->
5100 foreach my $rpre_string (@$rpre_string_stack) {
5101 $html_fh->print($$rpre_string);
5104 # and finish the html page
5105 $html_fh->print( <<"HTML_END");
5110 eval { $html_fh->close() }; # could be object without close method
5112 if ( $rOpts->{'frames'} ) {
5113 my @toc = map { $_ .= "\n" } split /\n/, $$rtoc_string;
5114 $self->make_frame( \@toc );
5120 my ( $rtokens, $rtoken_type, $rlevels ) = @_;
5121 my ( @colored_tokens, $j, $string, $type, $token, $level );
5122 my $rlast_level = $self->{_rlast_level};
5123 my $rpackage_stack = $self->{_rpackage_stack};
5125 for ( $j = 0 ; $j < @$rtoken_type ; $j++ ) {
5126 $type = $$rtoken_type[$j];
5127 $token = $$rtokens[$j];
5128 $level = $$rlevels[$j];
5129 $level = 0 if ( $level < 0 );
5131 #-------------------------------------------------------
5132 # Update the package stack. The package stack is needed to keep
5133 # the toc correct because some packages may be declared within
5134 # blocks and go out of scope when we leave the block.
5135 #-------------------------------------------------------
5136 if ( $level > $$rlast_level ) {
5137 unless ( $rpackage_stack->[ $level - 1 ] ) {
5138 $rpackage_stack->[ $level - 1 ] = 'main';
5140 $rpackage_stack->[$level] = $rpackage_stack->[ $level - 1 ];
5142 elsif ( $level < $$rlast_level ) {
5143 my $package = $rpackage_stack->[$level];
5144 unless ($package) { $package = 'main' }
5146 # if we change packages due to a nesting change, we
5147 # have to make an entry in the toc
5148 if ( $package ne $rpackage_stack->[ $level + 1 ] ) {
5149 $self->add_toc_item( $package, 'package' );
5152 $$rlast_level = $level;
5154 #-------------------------------------------------------
5155 # Intercept a sub name here; split it
5156 # into keyword 'sub' and sub name; and add an
5158 #-------------------------------------------------------
5159 if ( $type eq 'i' && $token =~ /^(sub\s+)(\w.*)$/ ) {
5160 $token = $self->markup_html_element( $1, 'k' );
5161 push @colored_tokens, $token;
5165 # but don't include sub declarations in the toc;
5166 # these wlll have leading token types 'i;'
5167 my $signature = join "", @$rtoken_type;
5168 unless ( $signature =~ /^i;/ ) {
5169 my $subname = $token;
5170 $subname =~ s/[\s\(].*$//; # remove any attributes and prototype
5171 $self->add_toc_item( $subname, 'sub' );
5175 #-------------------------------------------------------
5176 # Intercept a package name here; split it
5177 # into keyword 'package' and name; add to the toc,
5178 # and update the package stack
5179 #-------------------------------------------------------
5180 if ( $type eq 'i' && $token =~ /^(package\s+)(\w.*)$/ ) {
5181 $token = $self->markup_html_element( $1, 'k' );
5182 push @colored_tokens, $token;
5185 $self->add_toc_item( "$token", 'package' );
5186 $rpackage_stack->[$level] = $token;
5189 $token = $self->markup_html_element( $token, $type );
5190 push @colored_tokens, $token;
5192 return ( \@colored_tokens );
5195 sub markup_html_element {
5197 my ( $token, $type ) = @_;
5199 return $token if ( $type eq 'b' ); # skip a blank token
5200 return $token if ( $token =~ /^\s*$/ ); # skip a blank line
5201 $token = escape_html($token);
5203 # get the short abbreviation for this token type
5204 my $short_name = $token_short_names{$type};
5205 if ( !defined($short_name) ) {
5206 $short_name = "pu"; # punctuation is default
5209 # handle style sheets..
5210 if ( !$rOpts->{'nohtml-style-sheets'} ) {
5211 if ( $short_name ne 'pu' ) {
5212 $token = qq(<span class="$short_name">) . $token . "</span>";
5216 # handle no style sheets..
5218 my $color = $html_color{$short_name};
5220 if ( $color && ( $color ne $rOpts->{'html-color-punctuation'} ) ) {
5221 $token = qq(<font color="$color">) . $token . "</font>";
5223 if ( $html_italic{$short_name} ) { $token = "<i>$token</i>" }
5224 if ( $html_bold{$short_name} ) { $token = "<b>$token</b>" }
5232 if ($missing_html_entities) {
5233 $token =~ s/\&/&/g;
5234 $token =~ s/\</</g;
5235 $token =~ s/\>/>/g;
5236 $token =~ s/\"/"/g;
5239 HTML::Entities::encode_entities($token);
5244 sub finish_formatting {
5246 # called after last line
5248 $self->close_html_file();
5255 return unless $self->{_html_file_opened};
5256 my $html_pre_fh = $self->{_html_pre_fh};
5257 my ($line_of_tokens) = @_;
5258 my $line_type = $line_of_tokens->{_line_type};
5259 my $input_line = $line_of_tokens->{_line_text};
5260 my $line_number = $line_of_tokens->{_line_number};
5263 # markup line of code..
5265 if ( $line_type eq 'CODE' ) {
5266 my $rtoken_type = $line_of_tokens->{_rtoken_type};
5267 my $rtokens = $line_of_tokens->{_rtokens};
5268 my $rlevels = $line_of_tokens->{_rlevels};
5270 if ( $input_line =~ /(^\s*)/ ) {
5276 my ($rcolored_tokens) =
5277 $self->markup_tokens( $rtokens, $rtoken_type, $rlevels );
5278 $html_line .= join '', @$rcolored_tokens;
5281 # markup line of non-code..
5284 if ( $line_type eq 'HERE' ) { $line_character = 'H' }
5285 elsif ( $line_type eq 'HERE_END' ) { $line_character = 'h' }
5286 elsif ( $line_type eq 'FORMAT' ) { $line_character = 'H' }
5287 elsif ( $line_type eq 'FORMAT_END' ) { $line_character = 'h' }
5288 elsif ( $line_type eq 'SYSTEM' ) { $line_character = 'c' }
5289 elsif ( $line_type eq 'END_START' ) {
5290 $line_character = 'k';
5291 $self->add_toc_item( '__END__', '__END__' );
5293 elsif ( $line_type eq 'DATA_START' ) {
5294 $line_character = 'k';
5295 $self->add_toc_item( '__DATA__', '__DATA__' );
5297 elsif ( $line_type =~ /^POD/ ) {
5298 $line_character = 'P';
5299 if ( $rOpts->{'pod2html'} ) {
5300 my $html_pod_fh = $self->{_html_pod_fh};
5301 if ( $line_type eq 'POD_START' ) {
5303 my $rpre_string_stack = $self->{_rpre_string_stack};
5304 my $rpre_string = $rpre_string_stack->[-1];
5306 # if we have written any non-blank lines to the
5307 # current pre section, start writing to a new output
5309 if ( $$rpre_string =~ /\S/ ) {
5312 Perl::Tidy::IOScalar->new( \$pre_string, 'w' );
5313 $self->{_html_pre_fh} = $html_pre_fh;
5314 push @$rpre_string_stack, \$pre_string;
5316 # leave a marker in the pod stream so we know
5317 # where to put the pre section we just
5319 my $for_html = '=for html'; # don't confuse pod utils
5320 $html_pod_fh->print(<<EOM);
5323 <!-- pERLTIDY sECTION -->
5328 # otherwise, just clear the current string and start
5332 $html_pod_fh->print("\n");
5335 $html_pod_fh->print( $input_line . "\n" );
5336 if ( $line_type eq 'POD_END' ) {
5337 $self->{_pod_cut_count}++;
5338 $html_pod_fh->print("\n");
5343 else { $line_character = 'Q' }
5344 $html_line = $self->markup_html_element( $input_line, $line_character );
5347 # add the line number if requested
5348 if ( $rOpts->{'html-line-numbers'} ) {
5350 ( $line_number < 10 ) ? " "
5351 : ( $line_number < 100 ) ? " "
5352 : ( $line_number < 1000 ) ? " "
5354 $html_line = $extra_space . $line_number . " " . $html_line;
5358 $html_pre_fh->print("$html_line\n");
5361 #####################################################################
5363 # The Perl::Tidy::Formatter package adds indentation, whitespace, and
5364 # line breaks to the token stream
5366 # WARNING: This is not a real class for speed reasons. Only one
5367 # Formatter may be used.
5369 #####################################################################
5371 package Perl::Tidy::Formatter;
5375 # Caution: these debug flags produce a lot of output
5376 # They should all be 0 except when debugging small scripts
5377 use constant FORMATTER_DEBUG_FLAG_BOND => 0;
5378 use constant FORMATTER_DEBUG_FLAG_BREAK => 0;
5379 use constant FORMATTER_DEBUG_FLAG_CI => 0;
5380 use constant FORMATTER_DEBUG_FLAG_FLUSH => 0;
5381 use constant FORMATTER_DEBUG_FLAG_FORCE => 0;
5382 use constant FORMATTER_DEBUG_FLAG_LIST => 0;
5383 use constant FORMATTER_DEBUG_FLAG_NOBREAK => 0;
5384 use constant FORMATTER_DEBUG_FLAG_OUTPUT => 0;
5385 use constant FORMATTER_DEBUG_FLAG_SPARSE => 0;
5386 use constant FORMATTER_DEBUG_FLAG_STORE => 0;
5387 use constant FORMATTER_DEBUG_FLAG_UNDOBP => 0;
5388 use constant FORMATTER_DEBUG_FLAG_WHITE => 0;
5390 my $debug_warning = sub {
5391 print "FORMATTER_DEBUGGING with key $_[0]\n";
5394 FORMATTER_DEBUG_FLAG_BOND && $debug_warning->('BOND');
5395 FORMATTER_DEBUG_FLAG_BREAK && $debug_warning->('BREAK');
5396 FORMATTER_DEBUG_FLAG_CI && $debug_warning->('CI');
5397 FORMATTER_DEBUG_FLAG_FLUSH && $debug_warning->('FLUSH');
5398 FORMATTER_DEBUG_FLAG_FORCE && $debug_warning->('FORCE');
5399 FORMATTER_DEBUG_FLAG_LIST && $debug_warning->('LIST');
5400 FORMATTER_DEBUG_FLAG_NOBREAK && $debug_warning->('NOBREAK');
5401 FORMATTER_DEBUG_FLAG_OUTPUT && $debug_warning->('OUTPUT');
5402 FORMATTER_DEBUG_FLAG_SPARSE && $debug_warning->('SPARSE');
5403 FORMATTER_DEBUG_FLAG_STORE && $debug_warning->('STORE');
5404 FORMATTER_DEBUG_FLAG_UNDOBP && $debug_warning->('UNDOBP');
5405 FORMATTER_DEBUG_FLAG_WHITE && $debug_warning->('WHITE');
5412 $max_gnu_stack_index
5413 $gnu_position_predictor
5414 $line_start_index_to_go
5415 $last_indentation_written
5416 $last_unadjusted_indentation
5419 $saw_VERSION_in_this_file
5424 $gnu_sequence_number
5425 $last_output_indentation
5431 @type_sequence_to_go
5432 @container_environment_to_go
5433 @bond_strength_to_go
5434 @forced_breakpoint_to_go
5437 @leading_spaces_to_go
5438 @reduced_spaces_to_go
5439 @matching_token_to_go
5441 @nesting_blocks_to_go
5443 @nesting_depth_to_go
5445 @old_breakpoint_to_go
5449 %saved_opening_indentation
5452 $comma_count_in_batch
5453 $old_line_count_in_batch
5454 $last_nonblank_index_to_go
5455 $last_nonblank_type_to_go
5456 $last_nonblank_token_to_go
5457 $last_last_nonblank_index_to_go
5458 $last_last_nonblank_type_to_go
5459 $last_last_nonblank_token_to_go
5460 @nonblank_lines_at_depth
5464 $in_format_skipping_section
5465 $format_skipping_pattern_begin
5466 $format_skipping_pattern_end
5468 $forced_breakpoint_count
5469 $forced_breakpoint_undo_count
5470 @forced_breakpoint_undo_stack
5471 %postponed_breakpoint
5475 $first_embedded_tab_at
5476 $last_embedded_tab_at
5477 $deleted_semicolon_count
5478 $first_deleted_semicolon_at
5479 $last_deleted_semicolon_at
5480 $added_semicolon_count
5481 $first_added_semicolon_at
5482 $last_added_semicolon_at
5483 $first_tabbing_disagreement
5484 $last_tabbing_disagreement
5485 $in_tabbing_disagreement
5486 $tabbing_disagreement_count
5490 $last_line_leading_type
5491 $last_line_leading_level
5492 $last_last_line_leading_level
5495 %block_opening_line_number
5496 $csc_new_statement_ok
5497 $accumulating_text_for_block
5499 $rleading_block_if_elsif_text
5500 $leading_block_text_level
5501 $leading_block_text_length_exceeded
5502 $leading_block_text_line_length
5503 $leading_block_text_line_number
5504 $closing_side_comment_prefix_pattern
5505 $closing_side_comment_list_pattern
5507 $last_nonblank_token
5509 $last_last_nonblank_token
5510 $last_last_nonblank_type
5511 $last_nonblank_block_type
5514 %is_if_brace_follower
5515 %space_after_keyword
5518 %is_last_next_redo_return
5519 %is_other_brace_follower
5520 %is_else_brace_follower
5521 %is_anon_sub_brace_follower
5522 %is_anon_sub_1_brace_follower
5524 %is_sort_map_grep_eval
5525 %is_sort_map_grep_eval_do
5526 %is_block_without_semicolon
5531 %is_if_unless_and_or_last_next_redo_return
5532 %is_until_while_for_if_elsif_else
5538 $is_static_block_comment
5539 $index_start_one_line_block
5540 $semicolons_before_block_self_destruct
5541 $index_max_forced_break
5544 $vertical_aligner_object
5549 $last_line_had_side_comment
5552 $static_block_comment_pattern
5553 $static_side_comment_pattern
5554 %opening_vertical_tightness
5555 %closing_vertical_tightness
5556 %closing_token_indentation
5558 %opening_token_right
5559 %stack_opening_token
5560 %stack_closing_token
5562 $block_brace_vertical_tightness_pattern
5565 $rOpts_add_whitespace
5566 $rOpts_block_brace_tightness
5567 $rOpts_block_brace_vertical_tightness
5568 $rOpts_brace_left_and_indent
5569 $rOpts_comma_arrow_breakpoints
5570 $rOpts_break_at_old_keyword_breakpoints
5571 $rOpts_break_at_old_comma_breakpoints
5572 $rOpts_break_at_old_logical_breakpoints
5573 $rOpts_break_at_old_ternary_breakpoints
5574 $rOpts_closing_side_comment_else_flag
5575 $rOpts_closing_side_comment_maximum_text
5576 $rOpts_continuation_indentation
5578 $rOpts_delete_old_whitespace
5579 $rOpts_fuzzy_line_length
5580 $rOpts_indent_columns
5581 $rOpts_line_up_parentheses
5582 $rOpts_maximum_fields_per_table
5583 $rOpts_maximum_line_length
5584 $rOpts_short_concatenation_item_length
5585 $rOpts_swallow_optional_blank_lines
5586 $rOpts_ignore_old_breakpoints
5587 $rOpts_format_skipping
5588 $rOpts_space_function_paren
5589 $rOpts_space_keyword_paren
5591 $half_maximum_line_length
5595 %is_keyword_returning_list
5599 %right_bond_strength
5616 # default list of block types for which -bli would apply
5617 $bli_list_string = 'if else elsif unless while for foreach do : sub';
5620 .. :: << >> ** && .. || // -> => += -= .= %= &= |= ^= *= <>
5621 <= >= == =~ !~ != ++ -- /= x=
5623 @is_digraph{@_} = (1) x scalar(@_);
5625 @_ = qw( ... **= <<= >>= &&= ||= //= <=> );
5626 @is_trigraph{@_} = (1) x scalar(@_);
5629 = **= += *= &= <<= &&=
5630 -= /= |= >>= ||= //=
5634 @is_assignment{@_} = (1) x scalar(@_);
5644 @is_keyword_returning_list{@_} = (1) x scalar(@_);
5646 @_ = qw(is if unless and or err last next redo return);
5647 @is_if_unless_and_or_last_next_redo_return{@_} = (1) x scalar(@_);
5649 # always break after a closing curly of these block types:
5650 @_ = qw(until while for if elsif else);
5651 @is_until_while_for_if_elsif_else{@_} = (1) x scalar(@_);
5653 @_ = qw(last next redo return);
5654 @is_last_next_redo_return{@_} = (1) x scalar(@_);
5656 @_ = qw(sort map grep);
5657 @is_sort_map_grep{@_} = (1) x scalar(@_);
5659 @_ = qw(sort map grep eval);
5660 @is_sort_map_grep_eval{@_} = (1) x scalar(@_);
5662 @_ = qw(sort map grep eval do);
5663 @is_sort_map_grep_eval_do{@_} = (1) x scalar(@_);
5666 @is_if_unless{@_} = (1) x scalar(@_);
5668 @_ = qw(and or err);
5669 @is_and_or{@_} = (1) x scalar(@_);
5671 # Identify certain operators which often occur in chains.
5672 # Note: the minus (-) causes a side effect of padding of the first line in
5673 # something like this (by sub set_logical_padding):
5674 # Checkbutton => 'Transmission checked',
5675 # -variable => \$TRANS
5676 # This usually improves appearance so it seems ok.
5677 @_ = qw(&& || and or : ? . + - * /);
5678 @is_chain_operator{@_} = (1) x scalar(@_);
5680 # We can remove semicolons after blocks preceded by these keywords
5681 @_ = qw(BEGIN END CHECK INIT AUTOLOAD DESTROY continue if elsif else
5682 unless while until for foreach);
5683 @is_block_without_semicolon{@_} = (1) x scalar(@_);
5685 # 'L' is token for opening { at hash key
5687 @is_opening_type{@_} = (1) x scalar(@_);
5689 # 'R' is token for closing } at hash key
5691 @is_closing_type{@_} = (1) x scalar(@_);
5694 @is_opening_token{@_} = (1) x scalar(@_);
5697 @is_closing_token{@_} = (1) x scalar(@_);
5701 use constant WS_YES => 1;
5702 use constant WS_OPTIONAL => 0;
5703 use constant WS_NO => -1;
5705 # Token bond strengths.
5706 use constant NO_BREAK => 10000;
5707 use constant VERY_STRONG => 100;
5708 use constant STRONG => 2.1;
5709 use constant NOMINAL => 1.1;
5710 use constant WEAK => 0.8;
5711 use constant VERY_WEAK => 0.55;
5713 # values for testing indexes in output array
5714 use constant UNDEFINED_INDEX => -1;
5716 # Maximum number of little messages; probably need not be changed.
5717 use constant MAX_NAG_MESSAGES => 6;
5719 # increment between sequence numbers for each type
5720 # For example, ?: pairs might have numbers 7,11,15,...
5721 use constant TYPE_SEQUENCE_INCREMENT => 4;
5725 # methods to count instances
5727 sub get_count { $_count; }
5728 sub _increment_count { ++$_count }
5729 sub _decrement_count { --$_count }
5734 # trim leading and trailing whitespace from a string
5742 # given a string containing words separated by whitespace,
5743 # return the list of words
5748 return split( /\s+/, $str );
5751 # interface to Perl::Tidy::Logger routines
5753 if ($logger_object) {
5754 $logger_object->warning(@_);
5759 if ($logger_object) {
5760 $logger_object->complain(@_);
5764 sub write_logfile_entry {
5765 if ($logger_object) {
5766 $logger_object->write_logfile_entry(@_);
5771 if ($logger_object) {
5772 $logger_object->black_box(@_);
5776 sub report_definite_bug {
5777 if ($logger_object) {
5778 $logger_object->report_definite_bug();
5782 sub get_saw_brace_error {
5783 if ($logger_object) {
5784 $logger_object->get_saw_brace_error();
5788 sub we_are_at_the_last_line {
5789 if ($logger_object) {
5790 $logger_object->we_are_at_the_last_line();
5794 # interface to Perl::Tidy::Diagnostics routine
5795 sub write_diagnostics {
5797 if ($diagnostics_object) {
5798 $diagnostics_object->write_diagnostics(@_);
5802 sub get_added_semicolon_count {
5804 return $added_semicolon_count;
5808 $_[0]->_decrement_count();
5815 # we are given an object with a write_line() method to take lines
5817 sink_object => undef,
5818 diagnostics_object => undef,
5819 logger_object => undef,
5821 my %args = ( %defaults, @_ );
5823 $logger_object = $args{logger_object};
5824 $diagnostics_object = $args{diagnostics_object};
5826 # we create another object with a get_line() and peek_ahead() method
5827 my $sink_object = $args{sink_object};
5828 $file_writer_object =
5829 Perl::Tidy::FileWriter->new( $sink_object, $rOpts, $logger_object );
5831 # initialize the leading whitespace stack to negative levels
5832 # so that we can never run off the end of the stack
5833 $gnu_position_predictor = 0; # where the current token is predicted to be
5834 $max_gnu_stack_index = 0;
5835 $max_gnu_item_index = -1;
5836 $gnu_stack[0] = new_lp_indentation_item( 0, -1, -1, 0, 0 );
5837 @gnu_item_list = ();
5838 $last_output_indentation = 0;
5839 $last_indentation_written = 0;
5840 $last_unadjusted_indentation = 0;
5841 $last_leading_token = "";
5843 $saw_VERSION_in_this_file = !$rOpts->{'pass-version-line'};
5844 $saw_END_or_DATA_ = 0;
5846 @block_type_to_go = ();
5847 @type_sequence_to_go = ();
5848 @container_environment_to_go = ();
5849 @bond_strength_to_go = ();
5850 @forced_breakpoint_to_go = ();
5851 @lengths_to_go = (); # line length to start of ith token
5853 @matching_token_to_go = ();
5854 @mate_index_to_go = ();
5855 @nesting_blocks_to_go = ();
5856 @ci_levels_to_go = ();
5857 @nesting_depth_to_go = (0);
5858 @nobreak_to_go = ();
5859 @old_breakpoint_to_go = ();
5862 @leading_spaces_to_go = ();
5863 @reduced_spaces_to_go = ();
5866 @has_broken_sublist = ();
5867 @want_comma_break = ();
5870 $first_tabbing_disagreement = 0;
5871 $last_tabbing_disagreement = 0;
5872 $tabbing_disagreement_count = 0;
5873 $in_tabbing_disagreement = 0;
5874 $input_line_tabbing = undef;
5876 $last_line_type = "";
5877 $last_last_line_leading_level = 0;
5878 $last_line_leading_level = 0;
5879 $last_line_leading_type = '#';
5881 $last_nonblank_token = ';';
5882 $last_nonblank_type = ';';
5883 $last_last_nonblank_token = ';';
5884 $last_last_nonblank_type = ';';
5885 $last_nonblank_block_type = "";
5886 $last_output_level = 0;
5887 $looking_for_else = 0;
5888 $embedded_tab_count = 0;
5889 $first_embedded_tab_at = 0;
5890 $last_embedded_tab_at = 0;
5891 $deleted_semicolon_count = 0;
5892 $first_deleted_semicolon_at = 0;
5893 $last_deleted_semicolon_at = 0;
5894 $added_semicolon_count = 0;
5895 $first_added_semicolon_at = 0;
5896 $last_added_semicolon_at = 0;
5897 $last_line_had_side_comment = 0;
5898 $is_static_block_comment = 0;
5899 %postponed_breakpoint = ();
5901 # variables for adding side comments
5902 %block_leading_text = ();
5903 %block_opening_line_number = ();
5904 $csc_new_statement_ok = 1;
5906 %saved_opening_indentation = ();
5907 $in_format_skipping_section = 0;
5909 reset_block_text_accumulator();
5911 prepare_for_new_input_lines();
5913 $vertical_aligner_object =
5914 Perl::Tidy::VerticalAligner->initialize( $rOpts, $file_writer_object,
5915 $logger_object, $diagnostics_object );
5917 if ( $rOpts->{'entab-leading-whitespace'} ) {
5918 write_logfile_entry(
5919 "Leading whitespace will be entabbed with $rOpts->{'entab-leading-whitespace'} spaces per tab\n"
5922 elsif ( $rOpts->{'tabs'} ) {
5923 write_logfile_entry("Indentation will be with a tab character\n");
5926 write_logfile_entry(
5927 "Indentation will be with $rOpts->{'indent-columns'} spaces\n");
5930 # This was the start of a formatter referent, but object-oriented
5931 # coding has turned out to be too slow here.
5932 $formatter_self = {};
5934 bless $formatter_self, $class;
5936 # Safety check..this is not a class yet
5937 if ( _increment_count() > 1 ) {
5939 "Attempt to create more than 1 object in $class, which is not a true class yet\n";
5941 return $formatter_self;
5944 sub prepare_for_new_input_lines {
5946 $gnu_sequence_number++; # increment output batch counter
5947 %last_gnu_equals = ();
5948 %gnu_comma_count = ();
5949 %gnu_arrow_count = ();
5950 $line_start_index_to_go = 0;
5951 $max_gnu_item_index = UNDEFINED_INDEX;
5952 $index_max_forced_break = UNDEFINED_INDEX;
5953 $max_index_to_go = UNDEFINED_INDEX;
5954 $last_nonblank_index_to_go = UNDEFINED_INDEX;
5955 $last_nonblank_type_to_go = '';
5956 $last_nonblank_token_to_go = '';
5957 $last_last_nonblank_index_to_go = UNDEFINED_INDEX;
5958 $last_last_nonblank_type_to_go = '';
5959 $last_last_nonblank_token_to_go = '';
5960 $forced_breakpoint_count = 0;
5961 $forced_breakpoint_undo_count = 0;
5962 $rbrace_follower = undef;
5963 $lengths_to_go[0] = 0;
5964 $old_line_count_in_batch = 1;
5965 $comma_count_in_batch = 0;
5966 $starting_in_quote = 0;
5968 destroy_one_line_block();
5974 my ($line_of_tokens) = @_;
5976 my $line_type = $line_of_tokens->{_line_type};
5977 my $input_line = $line_of_tokens->{_line_text};
5979 my $want_blank_line_next = 0;
5981 # _line_type codes are:
5982 # SYSTEM - system-specific code before hash-bang line
5983 # CODE - line of perl code (including comments)
5984 # POD_START - line starting pod, such as '=head'
5985 # POD - pod documentation text
5986 # POD_END - last line of pod section, '=cut'
5987 # HERE - text of here-document
5988 # HERE_END - last line of here-doc (target word)
5989 # FORMAT - format section
5990 # FORMAT_END - last line of format section, '.'
5991 # DATA_START - __DATA__ line
5992 # DATA - unidentified text following __DATA__
5993 # END_START - __END__ line
5994 # END - unidentified text following __END__
5995 # ERROR - we are in big trouble, probably not a perl script
5997 # handle line of code..
5998 if ( $line_type eq 'CODE' ) {
6000 # let logger see all non-blank lines of code
6001 if ( $input_line !~ /^\s*$/ ) {
6002 my $output_line_number =
6003 $vertical_aligner_object->get_output_line_number();
6004 black_box( $line_of_tokens, $output_line_number );
6006 print_line_of_tokens($line_of_tokens);
6009 # handle line of non-code..
6015 if ( $line_type =~ /^POD/ ) {
6017 # Pod docs should have a preceding blank line. But be
6018 # very careful in __END__ and __DATA__ sections, because:
6019 # 1. the user may be using this section for any purpose whatsoever
6020 # 2. the blank counters are not active there
6021 # It should be safe to request a blank line between an
6022 # __END__ or __DATA__ and an immediately following '=head'
6023 # type line, (types END_START and DATA_START), but not for
6024 # any other lines of type END or DATA.
6025 if ( $rOpts->{'delete-pod'} ) { $skip_line = 1; }
6026 if ( $rOpts->{'tee-pod'} ) { $tee_line = 1; }
6028 && $line_type eq 'POD_START'
6029 && $last_line_type !~ /^(END|DATA)$/ )
6034 # patch to put a blank line after =cut
6035 # (required by podchecker)
6036 if ( $line_type eq 'POD_END' && !$saw_END_or_DATA_ ) {
6037 $file_writer_object->reset_consecutive_blank_lines();
6038 $want_blank_line_next = 1;
6042 # leave the blank counters in a predictable state
6043 # after __END__ or __DATA__
6044 elsif ( $line_type =~ /^(END_START|DATA_START)$/ ) {
6045 $file_writer_object->reset_consecutive_blank_lines();
6046 $saw_END_or_DATA_ = 1;
6049 # write unindented non-code line
6050 if ( !$skip_line ) {
6051 if ($tee_line) { $file_writer_object->tee_on() }
6052 write_unindented_line($input_line);
6053 if ($tee_line) { $file_writer_object->tee_off() }
6054 if ($want_blank_line_next) { want_blank_line(); }
6057 $last_line_type = $line_type;
6060 sub create_one_line_block {
6061 $index_start_one_line_block = $_[0];
6062 $semicolons_before_block_self_destruct = $_[1];
6065 sub destroy_one_line_block {
6066 $index_start_one_line_block = UNDEFINED_INDEX;
6067 $semicolons_before_block_self_destruct = 0;
6070 sub leading_spaces_to_go {
6072 # return the number of indentation spaces for a token in the output stream;
6073 # these were previously stored by 'set_leading_whitespace'.
6075 return get_SPACES( $leading_spaces_to_go[ $_[0] ] );
6081 # return the number of leading spaces associated with an indentation
6082 # variable $indentation is either a constant number of spaces or an object
6083 # with a get_SPACES method.
6084 my $indentation = shift;
6085 return ref($indentation) ? $indentation->get_SPACES() : $indentation;
6088 sub get_RECOVERABLE_SPACES {
6090 # return the number of spaces (+ means shift right, - means shift left)
6091 # that we would like to shift a group of lines with the same indentation
6092 # to get them to line up with their opening parens
6093 my $indentation = shift;
6094 return ref($indentation) ? $indentation->get_RECOVERABLE_SPACES() : 0;
6097 sub get_AVAILABLE_SPACES_to_go {
6099 my $item = $leading_spaces_to_go[ $_[0] ];
6101 # return the number of available leading spaces associated with an
6102 # indentation variable. $indentation is either a constant number of
6103 # spaces or an object with a get_AVAILABLE_SPACES method.
6104 return ref($item) ? $item->get_AVAILABLE_SPACES() : 0;
6107 sub new_lp_indentation_item {
6109 # this is an interface to the IndentationItem class
6110 my ( $spaces, $level, $ci_level, $available_spaces, $align_paren ) = @_;
6112 # A negative level implies not to store the item in the item_list
6114 if ( $level >= 0 ) { $index = ++$max_gnu_item_index; }
6116 my $item = Perl::Tidy::IndentationItem->new(
6118 $ci_level, $available_spaces,
6119 $index, $gnu_sequence_number,
6120 $align_paren, $max_gnu_stack_index,
6121 $line_start_index_to_go,
6124 if ( $level >= 0 ) {
6125 $gnu_item_list[$max_gnu_item_index] = $item;
6131 sub set_leading_whitespace {
6133 # This routine defines leading whitespace
6134 # given: the level and continuation_level of a token,
6135 # define: space count of leading string which would apply if it
6136 # were the first token of a new line.
6138 my ( $level, $ci_level, $in_continued_quote ) = @_;
6140 # modify for -bli, which adds one continuation indentation for
6142 if ( $rOpts_brace_left_and_indent
6143 && $max_index_to_go == 0
6144 && $block_type_to_go[$max_index_to_go] =~ /$bli_pattern/o )
6149 # patch to avoid trouble when input file has negative indentation.
6150 # other logic should catch this error.
6151 if ( $level < 0 ) { $level = 0 }
6153 #-------------------------------------------
6154 # handle the standard indentation scheme
6155 #-------------------------------------------
6156 unless ($rOpts_line_up_parentheses) {
6158 $ci_level * $rOpts_continuation_indentation +
6159 $level * $rOpts_indent_columns;
6161 ( $ci_level == 0 ) ? 0 : $rOpts_continuation_indentation;
6163 if ($in_continued_quote) {
6167 $leading_spaces_to_go[$max_index_to_go] = $space_count;
6168 $reduced_spaces_to_go[$max_index_to_go] = $space_count - $ci_spaces;
6172 #-------------------------------------------------------------
6173 # handle case of -lp indentation..
6174 #-------------------------------------------------------------
6176 # The continued_quote flag means that this is the first token of a
6177 # line, and it is the continuation of some kind of multi-line quote
6178 # or pattern. It requires special treatment because it must have no
6179 # added leading whitespace. So we create a special indentation item
6180 # which is not in the stack.
6181 if ($in_continued_quote) {
6182 my $space_count = 0;
6183 my $available_space = 0;
6184 $level = -1; # flag to prevent storing in item_list
6185 $leading_spaces_to_go[$max_index_to_go] =
6186 $reduced_spaces_to_go[$max_index_to_go] =
6187 new_lp_indentation_item( $space_count, $level, $ci_level,
6188 $available_space, 0 );
6192 # get the top state from the stack
6193 my $space_count = $gnu_stack[$max_gnu_stack_index]->get_SPACES();
6194 my $current_level = $gnu_stack[$max_gnu_stack_index]->get_LEVEL();
6195 my $current_ci_level = $gnu_stack[$max_gnu_stack_index]->get_CI_LEVEL();
6197 my $type = $types_to_go[$max_index_to_go];
6198 my $token = $tokens_to_go[$max_index_to_go];
6199 my $total_depth = $nesting_depth_to_go[$max_index_to_go];
6201 if ( $type eq '{' || $type eq '(' ) {
6203 $gnu_comma_count{ $total_depth + 1 } = 0;
6204 $gnu_arrow_count{ $total_depth + 1 } = 0;
6206 # If we come to an opening token after an '=' token of some type,
6207 # see if it would be helpful to 'break' after the '=' to save space
6208 my $last_equals = $last_gnu_equals{$total_depth};
6209 if ( $last_equals && $last_equals > $line_start_index_to_go ) {
6211 # find the position if we break at the '='
6212 my $i_test = $last_equals;
6213 if ( $types_to_go[ $i_test + 1 ] eq 'b' ) { $i_test++ }
6216 ##my $too_close = ($i_test==$max_index_to_go-1);
6218 my $test_position = total_line_length( $i_test, $max_index_to_go );
6222 # the equals is not just before an open paren (testing)
6225 # if we are beyond the midpoint
6226 $gnu_position_predictor > $half_maximum_line_length
6228 # or we are beyont the 1/4 point and there was an old
6229 # break at the equals
6231 $gnu_position_predictor > $half_maximum_line_length / 2
6233 $old_breakpoint_to_go[$last_equals]
6234 || ( $last_equals > 0
6235 && $old_breakpoint_to_go[ $last_equals - 1 ] )
6236 || ( $last_equals > 1
6237 && $types_to_go[ $last_equals - 1 ] eq 'b'
6238 && $old_breakpoint_to_go[ $last_equals - 2 ] )
6244 # then make the switch -- note that we do not set a real
6245 # breakpoint here because we may not really need one; sub
6246 # scan_list will do that if necessary
6247 $line_start_index_to_go = $i_test + 1;
6248 $gnu_position_predictor = $test_position;
6253 # Check for decreasing depth ..
6254 # Note that one token may have both decreasing and then increasing
6255 # depth. For example, (level, ci) can go from (1,1) to (2,0). So,
6256 # in this example we would first go back to (1,0) then up to (2,0)
6258 if ( $level < $current_level || $ci_level < $current_ci_level ) {
6260 # loop to find the first entry at or completely below this level
6261 my ( $lev, $ci_lev );
6263 if ($max_gnu_stack_index) {
6265 # save index of token which closes this level
6266 $gnu_stack[$max_gnu_stack_index]->set_CLOSED($max_index_to_go);
6268 # Undo any extra indentation if we saw no commas
6269 my $available_spaces =
6270 $gnu_stack[$max_gnu_stack_index]->get_AVAILABLE_SPACES();
6272 my $comma_count = 0;
6273 my $arrow_count = 0;
6274 if ( $type eq '}' || $type eq ')' ) {
6275 $comma_count = $gnu_comma_count{$total_depth};
6276 $arrow_count = $gnu_arrow_count{$total_depth};
6277 $comma_count = 0 unless $comma_count;
6278 $arrow_count = 0 unless $arrow_count;
6280 $gnu_stack[$max_gnu_stack_index]->set_COMMA_COUNT($comma_count);
6281 $gnu_stack[$max_gnu_stack_index]->set_ARROW_COUNT($arrow_count);
6283 if ( $available_spaces > 0 ) {
6285 if ( $comma_count <= 0 || $arrow_count > 0 ) {
6287 my $i = $gnu_stack[$max_gnu_stack_index]->get_INDEX();
6289 $gnu_stack[$max_gnu_stack_index]
6290 ->get_SEQUENCE_NUMBER();
6292 # Be sure this item was created in this batch. This
6293 # should be true because we delete any available
6294 # space from open items at the end of each batch.
6295 if ( $gnu_sequence_number != $seqno
6296 || $i > $max_gnu_item_index )
6299 "Program bug with -lp. seqno=$seqno should be $gnu_sequence_number and i=$i should be less than max=$max_gnu_item_index\n"
6301 report_definite_bug();
6305 if ( $arrow_count == 0 ) {
6307 ->permanently_decrease_AVAILABLE_SPACES(
6312 ->tentatively_decrease_AVAILABLE_SPACES(
6319 $j <= $max_gnu_item_index ;
6324 ->decrease_SPACES($available_spaces);
6331 --$max_gnu_stack_index;
6332 $lev = $gnu_stack[$max_gnu_stack_index]->get_LEVEL();
6333 $ci_lev = $gnu_stack[$max_gnu_stack_index]->get_CI_LEVEL();
6335 # stop when we reach a level at or below the current level
6336 if ( $lev <= $level && $ci_lev <= $ci_level ) {
6338 $gnu_stack[$max_gnu_stack_index]->get_SPACES();
6339 $current_level = $lev;
6340 $current_ci_level = $ci_lev;
6345 # reached bottom of stack .. should never happen because
6346 # only negative levels can get here, and $level was forced
6347 # to be positive above.
6350 "program bug with -lp: stack_error. level=$level; lev=$lev; ci_level=$ci_level; ci_lev=$ci_lev; rerun with -nlp\n"
6352 report_definite_bug();
6358 # handle increasing depth
6359 if ( $level > $current_level || $ci_level > $current_ci_level ) {
6361 # Compute the standard incremental whitespace. This will be
6362 # the minimum incremental whitespace that will be used. This
6363 # choice results in a smooth transition between the gnu-style
6364 # and the standard style.
6365 my $standard_increment =
6366 ( $level - $current_level ) * $rOpts_indent_columns +
6367 ( $ci_level - $current_ci_level ) * $rOpts_continuation_indentation;
6369 # Now we have to define how much extra incremental space
6370 # ("$available_space") we want. This extra space will be
6371 # reduced as necessary when long lines are encountered or when
6372 # it becomes clear that we do not have a good list.
6373 my $available_space = 0;
6374 my $align_paren = 0;
6377 # initialization on empty stack..
6378 if ( $max_gnu_stack_index == 0 ) {
6379 $space_count = $level * $rOpts_indent_columns;
6382 # if this is a BLOCK, add the standard increment
6383 elsif ($last_nonblank_block_type) {
6384 $space_count += $standard_increment;
6387 # if last nonblank token was not structural indentation,
6388 # just use standard increment
6389 elsif ( $last_nonblank_type ne '{' ) {
6390 $space_count += $standard_increment;
6393 # otherwise use the space to the first non-blank level change token
6396 $space_count = $gnu_position_predictor;
6398 my $min_gnu_indentation =
6399 $gnu_stack[$max_gnu_stack_index]->get_SPACES();
6401 $available_space = $space_count - $min_gnu_indentation;
6402 if ( $available_space >= $standard_increment ) {
6403 $min_gnu_indentation += $standard_increment;
6405 elsif ( $available_space > 1 ) {
6406 $min_gnu_indentation += $available_space + 1;
6408 elsif ( $last_nonblank_token =~ /^[\{\[\(]$/ ) {
6409 if ( ( $tightness{$last_nonblank_token} < 2 ) ) {
6410 $min_gnu_indentation += 2;
6413 $min_gnu_indentation += 1;
6417 $min_gnu_indentation += $standard_increment;
6419 $available_space = $space_count - $min_gnu_indentation;
6421 if ( $available_space < 0 ) {
6422 $space_count = $min_gnu_indentation;
6423 $available_space = 0;
6428 # update state, but not on a blank token
6429 if ( $types_to_go[$max_index_to_go] ne 'b' ) {
6431 $gnu_stack[$max_gnu_stack_index]->set_HAVE_CHILD(1);
6433 ++$max_gnu_stack_index;
6434 $gnu_stack[$max_gnu_stack_index] =
6435 new_lp_indentation_item( $space_count, $level, $ci_level,
6436 $available_space, $align_paren );
6438 # If the opening paren is beyond the half-line length, then
6439 # we will use the minimum (standard) indentation. This will
6440 # help avoid problems associated with running out of space
6441 # near the end of a line. As a result, in deeply nested
6442 # lists, there will be some indentations which are limited
6443 # to this minimum standard indentation. But the most deeply
6444 # nested container will still probably be able to shift its
6445 # parameters to the right for proper alignment, so in most
6446 # cases this will not be noticable.
6447 if ( $available_space > 0
6448 && $space_count > $half_maximum_line_length )
6450 $gnu_stack[$max_gnu_stack_index]
6451 ->tentatively_decrease_AVAILABLE_SPACES($available_space);
6456 # Count commas and look for non-list characters. Once we see a
6457 # non-list character, we give up and don't look for any more commas.
6458 if ( $type eq '=>' ) {
6459 $gnu_arrow_count{$total_depth}++;
6461 # tentatively treating '=>' like '=' for estimating breaks
6462 # TODO: this could use some experimentation
6463 $last_gnu_equals{$total_depth} = $max_index_to_go;
6466 elsif ( $type eq ',' ) {
6467 $gnu_comma_count{$total_depth}++;
6470 elsif ( $is_assignment{$type} ) {
6471 $last_gnu_equals{$total_depth} = $max_index_to_go;
6474 # this token might start a new line
6475 # if this is a non-blank..
6476 if ( $type ne 'b' ) {
6481 # this is the first nonblank token of the line
6482 $max_index_to_go == 1 && $types_to_go[0] eq 'b'
6484 # or previous character was one of these:
6485 || $last_nonblank_type_to_go =~ /^([\:\?\,f])$/
6487 # or previous character was opening and this does not close it
6488 || ( $last_nonblank_type_to_go eq '{' && $type ne '}' )
6489 || ( $last_nonblank_type_to_go eq '(' and $type ne ')' )
6491 # or this token is one of these:
6492 || $type =~ /^([\.]|\|\||\&\&)$/
6494 # or this is a closing structure
6495 || ( $last_nonblank_type_to_go eq '}'
6496 && $last_nonblank_token_to_go eq $last_nonblank_type_to_go )
6498 # or previous token was keyword 'return'
6499 || ( $last_nonblank_type_to_go eq 'k'
6500 && ( $last_nonblank_token_to_go eq 'return' && $type ne '{' ) )
6502 # or starting a new line at certain keywords is fine
6504 && $is_if_unless_and_or_last_next_redo_return{$token} )
6506 # or this is after an assignment after a closing structure
6508 $is_assignment{$last_nonblank_type_to_go}
6510 $last_last_nonblank_type_to_go =~ /^[\}\)\]]$/
6512 # and it is significantly to the right
6513 || $gnu_position_predictor > $half_maximum_line_length
6518 check_for_long_gnu_style_lines();
6519 $line_start_index_to_go = $max_index_to_go;
6521 # back up 1 token if we want to break before that type
6522 # otherwise, we may strand tokens like '?' or ':' on a line
6523 if ( $line_start_index_to_go > 0 ) {
6524 if ( $last_nonblank_type_to_go eq 'k' ) {
6526 if ( $want_break_before{$last_nonblank_token_to_go} ) {
6527 $line_start_index_to_go--;
6530 elsif ( $want_break_before{$last_nonblank_type_to_go} ) {
6531 $line_start_index_to_go--;
6537 # remember the predicted position of this token on the output line
6538 if ( $max_index_to_go > $line_start_index_to_go ) {
6539 $gnu_position_predictor =
6540 total_line_length( $line_start_index_to_go, $max_index_to_go );
6543 $gnu_position_predictor = $space_count +
6544 token_sequence_length( $max_index_to_go, $max_index_to_go );
6547 # store the indentation object for this token
6548 # this allows us to manipulate the leading whitespace
6549 # (in case we have to reduce indentation to fit a line) without
6550 # having to change any token values
6551 $leading_spaces_to_go[$max_index_to_go] = $gnu_stack[$max_gnu_stack_index];
6552 $reduced_spaces_to_go[$max_index_to_go] =
6553 ( $max_gnu_stack_index > 0 && $ci_level )
6554 ? $gnu_stack[ $max_gnu_stack_index - 1 ]
6555 : $gnu_stack[$max_gnu_stack_index];
6559 sub check_for_long_gnu_style_lines {
6561 # look at the current estimated maximum line length, and
6562 # remove some whitespace if it exceeds the desired maximum
6564 # this is only for the '-lp' style
6565 return unless ($rOpts_line_up_parentheses);
6567 # nothing can be done if no stack items defined for this line
6568 return if ( $max_gnu_item_index == UNDEFINED_INDEX );
6570 # see if we have exceeded the maximum desired line length
6571 # keep 2 extra free because they are needed in some cases
6572 # (result of trial-and-error testing)
6574 $gnu_position_predictor - $rOpts_maximum_line_length + 2;
6576 return if ( $spaces_needed < 0 );
6578 # We are over the limit, so try to remove a requested number of
6579 # spaces from leading whitespace. We are only allowed to remove
6580 # from whitespace items created on this batch, since others have
6581 # already been used and cannot be undone.
6582 my @candidates = ();
6585 # loop over all whitespace items created for the current batch
6586 for ( $i = 0 ; $i <= $max_gnu_item_index ; $i++ ) {
6587 my $item = $gnu_item_list[$i];
6589 # item must still be open to be a candidate (otherwise it
6590 # cannot influence the current token)
6591 next if ( $item->get_CLOSED() >= 0 );
6593 my $available_spaces = $item->get_AVAILABLE_SPACES();
6595 if ( $available_spaces > 0 ) {
6596 push( @candidates, [ $i, $available_spaces ] );
6600 return unless (@candidates);
6602 # sort by available whitespace so that we can remove whitespace
6603 # from the maximum available first
6604 @candidates = sort { $b->[1] <=> $a->[1] } @candidates;
6606 # keep removing whitespace until we are done or have no more
6608 foreach $candidate (@candidates) {
6609 my ( $i, $available_spaces ) = @{$candidate};
6610 my $deleted_spaces =
6611 ( $available_spaces > $spaces_needed )
6613 : $available_spaces;
6615 # remove the incremental space from this item
6616 $gnu_item_list[$i]->decrease_AVAILABLE_SPACES($deleted_spaces);
6620 # update the leading whitespace of this item and all items
6621 # that came after it
6622 for ( ; $i <= $max_gnu_item_index ; $i++ ) {
6624 my $old_spaces = $gnu_item_list[$i]->get_SPACES();
6625 if ( $old_spaces > $deleted_spaces ) {
6626 $gnu_item_list[$i]->decrease_SPACES($deleted_spaces);
6629 # shouldn't happen except for code bug:
6631 my $level = $gnu_item_list[$i_debug]->get_LEVEL();
6632 my $ci_level = $gnu_item_list[$i_debug]->get_CI_LEVEL();
6633 my $old_level = $gnu_item_list[$i]->get_LEVEL();
6634 my $old_ci_level = $gnu_item_list[$i]->get_CI_LEVEL();
6636 "program bug with -lp: want to delete $deleted_spaces from item $i, but old=$old_spaces deleted: lev=$level ci=$ci_level deleted: level=$old_level ci=$ci_level\n"
6638 report_definite_bug();
6641 $gnu_position_predictor -= $deleted_spaces;
6642 $spaces_needed -= $deleted_spaces;
6643 last unless ( $spaces_needed > 0 );
6647 sub finish_lp_batch {
6649 # This routine is called once after each each output stream batch is
6650 # finished to undo indentation for all incomplete -lp
6651 # indentation levels. It is too risky to leave a level open,
6652 # because then we can't backtrack in case of a long line to follow.
6653 # This means that comments and blank lines will disrupt this
6654 # indentation style. But the vertical aligner may be able to
6655 # get the space back if there are side comments.
6657 # this is only for the 'lp' style
6658 return unless ($rOpts_line_up_parentheses);
6660 # nothing can be done if no stack items defined for this line
6661 return if ( $max_gnu_item_index == UNDEFINED_INDEX );
6663 # loop over all whitespace items created for the current batch
6665 for ( $i = 0 ; $i <= $max_gnu_item_index ; $i++ ) {
6666 my $item = $gnu_item_list[$i];
6668 # only look for open items
6669 next if ( $item->get_CLOSED() >= 0 );
6671 # Tentatively remove all of the available space
6672 # (The vertical aligner will try to get it back later)
6673 my $available_spaces = $item->get_AVAILABLE_SPACES();
6674 if ( $available_spaces > 0 ) {
6676 # delete incremental space for this item
6678 ->tentatively_decrease_AVAILABLE_SPACES($available_spaces);
6680 # Reduce the total indentation space of any nodes that follow
6681 # Note that any such nodes must necessarily be dependents
6683 foreach ( $i + 1 .. $max_gnu_item_index ) {
6684 $gnu_item_list[$_]->decrease_SPACES($available_spaces);
6691 sub reduce_lp_indentation {
6693 # reduce the leading whitespace at token $i if possible by $spaces_needed
6694 # (a large value of $spaces_needed will remove all excess space)
6695 # NOTE: to be called from scan_list only for a sequence of tokens
6696 # contained between opening and closing parens/braces/brackets
6698 my ( $i, $spaces_wanted ) = @_;
6699 my $deleted_spaces = 0;
6701 my $item = $leading_spaces_to_go[$i];
6702 my $available_spaces = $item->get_AVAILABLE_SPACES();
6705 $available_spaces > 0
6706 && ( ( $spaces_wanted <= $available_spaces )
6707 || !$item->get_HAVE_CHILD() )
6711 # we'll remove these spaces, but mark them as recoverable
6713 $item->tentatively_decrease_AVAILABLE_SPACES($spaces_wanted);
6716 return $deleted_spaces;
6719 sub token_sequence_length {
6721 # return length of tokens ($ifirst .. $ilast) including first & last
6722 # returns 0 if $ifirst > $ilast
6725 return 0 if ( $ilast < 0 || $ifirst > $ilast );
6726 return $lengths_to_go[ $ilast + 1 ] if ( $ifirst < 0 );
6727 return $lengths_to_go[ $ilast + 1 ] - $lengths_to_go[$ifirst];
6730 sub total_line_length {
6732 # return length of a line of tokens ($ifirst .. $ilast)
6735 if ( $ifirst < 0 ) { $ifirst = 0 }
6737 return leading_spaces_to_go($ifirst) +
6738 token_sequence_length( $ifirst, $ilast );
6741 sub excess_line_length {
6743 # return number of characters by which a line of tokens ($ifirst..$ilast)
6744 # exceeds the allowable line length.
6747 if ( $ifirst < 0 ) { $ifirst = 0 }
6748 return leading_spaces_to_go($ifirst) +
6749 token_sequence_length( $ifirst, $ilast ) - $rOpts_maximum_line_length;
6752 sub finish_formatting {
6754 # flush buffer and write any informative messages
6758 $file_writer_object->decrement_output_line_number()
6759 ; # fix up line number since it was incremented
6760 we_are_at_the_last_line();
6761 if ( $added_semicolon_count > 0 ) {
6762 my $first = ( $added_semicolon_count > 1 ) ? "First" : "";
6764 ( $added_semicolon_count > 1 ) ? "semicolons were" : "semicolon was";
6765 write_logfile_entry("$added_semicolon_count $what added:\n");
6766 write_logfile_entry(
6767 " $first at input line $first_added_semicolon_at\n");
6769 if ( $added_semicolon_count > 1 ) {
6770 write_logfile_entry(
6771 " Last at input line $last_added_semicolon_at\n");
6773 write_logfile_entry(" (Use -nasc to prevent semicolon addition)\n");
6774 write_logfile_entry("\n");
6777 if ( $deleted_semicolon_count > 0 ) {
6778 my $first = ( $deleted_semicolon_count > 1 ) ? "First" : "";
6780 ( $deleted_semicolon_count > 1 )
6783 write_logfile_entry(
6784 "$deleted_semicolon_count unnecessary $what deleted:\n");
6785 write_logfile_entry(
6786 " $first at input line $first_deleted_semicolon_at\n");
6788 if ( $deleted_semicolon_count > 1 ) {
6789 write_logfile_entry(
6790 " Last at input line $last_deleted_semicolon_at\n");
6792 write_logfile_entry(" (Use -ndsc to prevent semicolon deletion)\n");
6793 write_logfile_entry("\n");
6796 if ( $embedded_tab_count > 0 ) {
6797 my $first = ( $embedded_tab_count > 1 ) ? "First" : "";
6799 ( $embedded_tab_count > 1 )
6800 ? "quotes or patterns"
6801 : "quote or pattern";
6802 write_logfile_entry("$embedded_tab_count $what had embedded tabs:\n");
6803 write_logfile_entry(
6804 "This means the display of this script could vary with device or software\n"
6806 write_logfile_entry(" $first at input line $first_embedded_tab_at\n");
6808 if ( $embedded_tab_count > 1 ) {
6809 write_logfile_entry(
6810 " Last at input line $last_embedded_tab_at\n");
6812 write_logfile_entry("\n");
6815 if ($first_tabbing_disagreement) {
6816 write_logfile_entry(
6817 "First indentation disagreement seen at input line $first_tabbing_disagreement\n"
6821 if ($in_tabbing_disagreement) {
6822 write_logfile_entry(
6823 "Ending with indentation disagreement which started at input line $in_tabbing_disagreement\n"
6828 if ($last_tabbing_disagreement) {
6830 write_logfile_entry(
6831 "Last indentation disagreement seen at input line $last_tabbing_disagreement\n"
6835 write_logfile_entry("No indentation disagreement seen\n");
6838 write_logfile_entry("\n");
6840 $vertical_aligner_object->report_anything_unusual();
6842 $file_writer_object->report_line_length_errors();
6847 # This routine is called to check the Opts hash after it is defined
6850 my ( $tabbing_string, $tab_msg );
6852 make_static_block_comment_pattern();
6853 make_static_side_comment_pattern();
6854 make_closing_side_comment_prefix();
6855 make_closing_side_comment_list_pattern();
6856 $format_skipping_pattern_begin =
6857 make_format_skipping_pattern( 'format-skipping-begin', '#<<<' );
6858 $format_skipping_pattern_end =
6859 make_format_skipping_pattern( 'format-skipping-end', '#>>>' );
6861 # If closing side comments ARE selected, then we can safely
6862 # delete old closing side comments unless closing side comment
6863 # warnings are requested. This is a good idea because it will
6864 # eliminate any old csc's which fall below the line count threshold.
6865 # We cannot do this if warnings are turned on, though, because we
6866 # might delete some text which has been added. So that must
6867 # be handled when comments are created.
6868 if ( $rOpts->{'closing-side-comments'} ) {
6869 if ( !$rOpts->{'closing-side-comment-warnings'} ) {
6870 $rOpts->{'delete-closing-side-comments'} = 1;
6874 # If closing side comments ARE NOT selected, but warnings ARE
6875 # selected and we ARE DELETING csc's, then we will pretend to be
6876 # adding with a huge interval. This will force the comments to be
6877 # generated for comparison with the old comments, but not added.
6878 elsif ( $rOpts->{'closing-side-comment-warnings'} ) {
6879 if ( $rOpts->{'delete-closing-side-comments'} ) {
6880 $rOpts->{'delete-closing-side-comments'} = 0;
6881 $rOpts->{'closing-side-comments'} = 1;
6882 $rOpts->{'closing-side-comment-interval'} = 100000000;
6887 make_block_brace_vertical_tightness_pattern();
6889 if ( $rOpts->{'line-up-parentheses'} ) {
6891 if ( $rOpts->{'indent-only'}
6892 || !$rOpts->{'add-newlines'}
6893 || !$rOpts->{'delete-old-newlines'} )
6896 -----------------------------------------------------------------------
6897 Conflict: -lp conflicts with -io, -fnl, -nanl, or -ndnl; ignoring -lp
6899 The -lp indentation logic requires that perltidy be able to coordinate
6900 arbitrarily large numbers of line breakpoints. This isn't possible
6901 with these flags. Sometimes an acceptable workaround is to use -wocb=3
6902 -----------------------------------------------------------------------
6904 $rOpts->{'line-up-parentheses'} = 0;
6908 # At present, tabs are not compatable with the line-up-parentheses style
6909 # (it would be possible to entab the total leading whitespace
6910 # just prior to writing the line, if desired).
6911 if ( $rOpts->{'line-up-parentheses'} && $rOpts->{'tabs'} ) {
6913 Conflict: -t (tabs) cannot be used with the -lp option; ignoring -t; see -et.
6915 $rOpts->{'tabs'} = 0;
6918 # Likewise, tabs are not compatable with outdenting..
6919 if ( $rOpts->{'outdent-keywords'} && $rOpts->{'tabs'} ) {
6921 Conflict: -t (tabs) cannot be used with the -okw options; ignoring -t; see -et.
6923 $rOpts->{'tabs'} = 0;
6926 if ( $rOpts->{'outdent-labels'} && $rOpts->{'tabs'} ) {
6928 Conflict: -t (tabs) cannot be used with the -ola option; ignoring -t; see -et.
6930 $rOpts->{'tabs'} = 0;
6933 if ( !$rOpts->{'space-for-semicolon'} ) {
6934 $want_left_space{'f'} = -1;
6937 if ( $rOpts->{'space-terminal-semicolon'} ) {
6938 $want_left_space{';'} = 1;
6941 # implement outdenting preferences for keywords
6942 %outdent_keyword = ();
6943 unless ( @_ = split_words( $rOpts->{'outdent-keyword-okl'} ) ) {
6944 @_ = qw(next last redo goto return); # defaults
6947 # FUTURE: if not a keyword, assume that it is an identifier
6949 if ( $Perl::Tidy::Tokenizer::is_keyword{$_} ) {
6950 $outdent_keyword{$_} = 1;
6953 warn "ignoring '$_' in -okwl list; not a perl keyword";
6957 # implement user whitespace preferences
6958 if ( @_ = split_words( $rOpts->{'want-left-space'} ) ) {
6959 @want_left_space{@_} = (1) x scalar(@_);
6962 if ( @_ = split_words( $rOpts->{'want-right-space'} ) ) {
6963 @want_right_space{@_} = (1) x scalar(@_);
6966 if ( @_ = split_words( $rOpts->{'nowant-left-space'} ) ) {
6967 @want_left_space{@_} = (-1) x scalar(@_);
6970 if ( @_ = split_words( $rOpts->{'nowant-right-space'} ) ) {
6971 @want_right_space{@_} = (-1) x scalar(@_);
6973 if ( $rOpts->{'dump-want-left-space'} ) {
6974 dump_want_left_space(*STDOUT);
6978 if ( $rOpts->{'dump-want-right-space'} ) {
6979 dump_want_right_space(*STDOUT);
6983 # default keywords for which space is introduced before an opening paren
6984 # (at present, including them messes up vertical alignment)
6985 @_ = qw(my local our and or err eq ne if else elsif until
6986 unless while for foreach return switch case given when);
6987 @space_after_keyword{@_} = (1) x scalar(@_);
6989 # allow user to modify these defaults
6990 if ( @_ = split_words( $rOpts->{'space-after-keyword'} ) ) {
6991 @space_after_keyword{@_} = (1) x scalar(@_);
6994 if ( @_ = split_words( $rOpts->{'nospace-after-keyword'} ) ) {
6995 @space_after_keyword{@_} = (0) x scalar(@_);
6998 # implement user break preferences
6999 foreach my $tok ( split_words( $rOpts->{'want-break-after'} ) ) {
7000 if ( $tok eq '?' ) { $tok = ':' } # patch to coordinate ?/:
7001 my $lbs = $left_bond_strength{$tok};
7002 my $rbs = $right_bond_strength{$tok};
7003 if ( defined($lbs) && defined($rbs) && $lbs < $rbs ) {
7004 ( $right_bond_strength{$tok}, $left_bond_strength{$tok} ) =
7009 foreach my $tok ( split_words( $rOpts->{'want-break-before'} ) ) {
7010 my $lbs = $left_bond_strength{$tok};
7011 my $rbs = $right_bond_strength{$tok};
7012 if ( defined($lbs) && defined($rbs) && $rbs < $lbs ) {
7013 ( $right_bond_strength{$tok}, $left_bond_strength{$tok} ) =
7018 # make note if breaks are before certain key types
7019 %want_break_before = ();
7021 '=', '.', ',', ':', '?', '&&', '||', 'and',
7022 'or', 'err', 'xor', '+', '-', '*', '/',
7025 $want_break_before{$tok} =
7026 $left_bond_strength{$tok} < $right_bond_strength{$tok};
7029 # Coordinate ?/: breaks, which must be similar
7030 if ( !$want_break_before{':'} ) {
7031 $want_break_before{'?'} = $want_break_before{':'};
7032 $right_bond_strength{'?'} = $right_bond_strength{':'} + 0.01;
7033 $left_bond_strength{'?'} = NO_BREAK;
7036 # Define here tokens which may follow the closing brace of a do statement
7037 # on the same line, as in:
7038 # } while ( $something);
7039 @_ = qw(until while unless if ; : );
7041 @is_do_follower{@_} = (1) x scalar(@_);
7043 # These tokens may follow the closing brace of an if or elsif block.
7044 # In other words, for cuddled else we want code to look like:
7045 # } elsif ( $something) {
7047 if ( $rOpts->{'cuddled-else'} ) {
7048 @_ = qw(else elsif);
7049 @is_if_brace_follower{@_} = (1) x scalar(@_);
7052 %is_if_brace_follower = ();
7055 # nothing can follow the closing curly of an else { } block:
7056 %is_else_brace_follower = ();
7058 # what can follow a multi-line anonymous sub definition closing curly:
7059 @_ = qw# ; : => or and && || ~~ !~~ ) #;
7061 @is_anon_sub_brace_follower{@_} = (1) x scalar(@_);
7063 # what can follow a one-line anonynomous sub closing curly:
7064 # one-line anonumous subs also have ']' here...
7065 # see tk3.t and PP.pm
7066 @_ = qw# ; : => or and && || ) ] ~~ !~~ #;
7068 @is_anon_sub_1_brace_follower{@_} = (1) x scalar(@_);
7070 # What can follow a closing curly of a block
7071 # which is not an if/elsif/else/do/sort/map/grep/eval/sub
7072 # Testfiles: 'Toolbar.pm', 'Menubar.pm', bless.t, '3rules.pl'
7073 @_ = qw# ; : => or and && || ) #;
7076 # allow cuddled continue if cuddled else is specified
7077 if ( $rOpts->{'cuddled-else'} ) { push @_, 'continue'; }
7079 @is_other_brace_follower{@_} = (1) x scalar(@_);
7081 $right_bond_strength{'{'} = WEAK;
7082 $left_bond_strength{'{'} = VERY_STRONG;
7084 # make -l=0 equal to -l=infinite
7085 if ( !$rOpts->{'maximum-line-length'} ) {
7086 $rOpts->{'maximum-line-length'} = 1000000;
7089 # make -lbl=0 equal to -lbl=infinite
7090 if ( !$rOpts->{'long-block-line-count'} ) {
7091 $rOpts->{'long-block-line-count'} = 1000000;
7094 my $ole = $rOpts->{'output-line-ending'};
7103 unless ( $rOpts->{'output-line-ending'} = $endings{$ole} ) {
7104 my $str = join " ", keys %endings;
7106 Unrecognized line ending '$ole'; expecting one of: $str
7109 if ( $rOpts->{'preserve-line-endings'} ) {
7110 warn "Ignoring -ple; conflicts with -ole\n";
7111 $rOpts->{'preserve-line-endings'} = undef;
7115 # hashes used to simplify setting whitespace
7117 '{' => $rOpts->{'brace-tightness'},
7118 '}' => $rOpts->{'brace-tightness'},
7119 '(' => $rOpts->{'paren-tightness'},
7120 ')' => $rOpts->{'paren-tightness'},
7121 '[' => $rOpts->{'square-bracket-tightness'},
7122 ']' => $rOpts->{'square-bracket-tightness'},
7131 # frequently used parameters
7132 $rOpts_add_newlines = $rOpts->{'add-newlines'};
7133 $rOpts_add_whitespace = $rOpts->{'add-whitespace'};
7134 $rOpts_block_brace_tightness = $rOpts->{'block-brace-tightness'};
7135 $rOpts_block_brace_vertical_tightness =
7136 $rOpts->{'block-brace-vertical-tightness'};
7137 $rOpts_brace_left_and_indent = $rOpts->{'brace-left-and-indent'};
7138 $rOpts_comma_arrow_breakpoints = $rOpts->{'comma-arrow-breakpoints'};
7139 $rOpts_break_at_old_ternary_breakpoints =
7140 $rOpts->{'break-at-old-ternary-breakpoints'};
7141 $rOpts_break_at_old_comma_breakpoints =
7142 $rOpts->{'break-at-old-comma-breakpoints'};
7143 $rOpts_break_at_old_keyword_breakpoints =
7144 $rOpts->{'break-at-old-keyword-breakpoints'};
7145 $rOpts_break_at_old_logical_breakpoints =
7146 $rOpts->{'break-at-old-logical-breakpoints'};
7147 $rOpts_closing_side_comment_else_flag =
7148 $rOpts->{'closing-side-comment-else-flag'};
7149 $rOpts_closing_side_comment_maximum_text =
7150 $rOpts->{'closing-side-comment-maximum-text'};
7151 $rOpts_continuation_indentation = $rOpts->{'continuation-indentation'};
7152 $rOpts_cuddled_else = $rOpts->{'cuddled-else'};
7153 $rOpts_delete_old_whitespace = $rOpts->{'delete-old-whitespace'};
7154 $rOpts_fuzzy_line_length = $rOpts->{'fuzzy-line-length'};
7155 $rOpts_indent_columns = $rOpts->{'indent-columns'};
7156 $rOpts_line_up_parentheses = $rOpts->{'line-up-parentheses'};
7157 $rOpts_maximum_fields_per_table = $rOpts->{'maximum-fields-per-table'};
7158 $rOpts_maximum_line_length = $rOpts->{'maximum-line-length'};
7159 $rOpts_short_concatenation_item_length =
7160 $rOpts->{'short-concatenation-item-length'};
7161 $rOpts_swallow_optional_blank_lines =
7162 $rOpts->{'swallow-optional-blank-lines'};
7163 $rOpts_ignore_old_breakpoints = $rOpts->{'ignore-old-breakpoints'};
7164 $rOpts_format_skipping = $rOpts->{'format-skipping'};
7165 $rOpts_space_function_paren = $rOpts->{'space-function-paren'};
7166 $rOpts_space_keyword_paren = $rOpts->{'space-keyword-paren'};
7167 $half_maximum_line_length = $rOpts_maximum_line_length / 2;
7169 # Note that both opening and closing tokens can access the opening
7170 # and closing flags of their container types.
7171 %opening_vertical_tightness = (
7172 '(' => $rOpts->{'paren-vertical-tightness'},
7173 '{' => $rOpts->{'brace-vertical-tightness'},
7174 '[' => $rOpts->{'square-bracket-vertical-tightness'},
7175 ')' => $rOpts->{'paren-vertical-tightness'},
7176 '}' => $rOpts->{'brace-vertical-tightness'},
7177 ']' => $rOpts->{'square-bracket-vertical-tightness'},
7180 %closing_vertical_tightness = (
7181 '(' => $rOpts->{'paren-vertical-tightness-closing'},
7182 '{' => $rOpts->{'brace-vertical-tightness-closing'},
7183 '[' => $rOpts->{'square-bracket-vertical-tightness-closing'},
7184 ')' => $rOpts->{'paren-vertical-tightness-closing'},
7185 '}' => $rOpts->{'brace-vertical-tightness-closing'},
7186 ']' => $rOpts->{'square-bracket-vertical-tightness-closing'},
7189 # assume flag for '>' same as ')' for closing qw quotes
7190 %closing_token_indentation = (
7191 ')' => $rOpts->{'closing-paren-indentation'},
7192 '}' => $rOpts->{'closing-brace-indentation'},
7193 ']' => $rOpts->{'closing-square-bracket-indentation'},
7194 '>' => $rOpts->{'closing-paren-indentation'},
7197 %opening_token_right = (
7198 '(' => $rOpts->{'opening-paren-right'},
7199 '{' => $rOpts->{'opening-hash-brace-right'},
7200 '[' => $rOpts->{'opening-square-bracket-right'},
7203 %stack_opening_token = (
7204 '(' => $rOpts->{'stack-opening-paren'},
7205 '{' => $rOpts->{'stack-opening-hash-brace'},
7206 '[' => $rOpts->{'stack-opening-square-bracket'},
7209 %stack_closing_token = (
7210 ')' => $rOpts->{'stack-closing-paren'},
7211 '}' => $rOpts->{'stack-closing-hash-brace'},
7212 ']' => $rOpts->{'stack-closing-square-bracket'},
7216 sub make_static_block_comment_pattern {
7218 # create the pattern used to identify static block comments
7219 $static_block_comment_pattern = '^\s*##';
7221 # allow the user to change it
7222 if ( $rOpts->{'static-block-comment-prefix'} ) {
7223 my $prefix = $rOpts->{'static-block-comment-prefix'};
7224 $prefix =~ s/^\s*//;
7225 my $pattern = $prefix;
7227 # user may give leading caret to force matching left comments only
7228 if ( $prefix !~ /^\^#/ ) {
7229 if ( $prefix !~ /^#/ ) {
7231 "ERROR: the -sbcp prefix is '$prefix' but must begin with '#' or '^#'\n";
7233 $pattern = '^\s*' . $prefix;
7235 eval "'##'=~/$pattern/";
7238 "ERROR: the -sbc prefix '$prefix' causes the invalid regex '$pattern'\n";
7240 $static_block_comment_pattern = $pattern;
7244 sub make_format_skipping_pattern {
7245 my ( $opt_name, $default ) = @_;
7246 my $param = $rOpts->{$opt_name};
7247 unless ($param) { $param = $default }
7249 if ( $param !~ /^#/ ) {
7250 die "ERROR: the $opt_name parameter '$param' must begin with '#'\n";
7252 my $pattern = '^' . $param . '\s';
7253 eval "'#'=~/$pattern/";
7256 "ERROR: the $opt_name parameter '$param' causes the invalid regex '$pattern'\n";
7261 sub make_closing_side_comment_list_pattern {
7263 # turn any input list into a regex for recognizing selected block types
7264 $closing_side_comment_list_pattern = '^\w+';
7265 if ( defined( $rOpts->{'closing-side-comment-list'} )
7266 && $rOpts->{'closing-side-comment-list'} )
7268 $closing_side_comment_list_pattern =
7269 make_block_pattern( '-cscl', $rOpts->{'closing-side-comment-list'} );
7273 sub make_bli_pattern {
7275 if ( defined( $rOpts->{'brace-left-and-indent-list'} )
7276 && $rOpts->{'brace-left-and-indent-list'} )
7278 $bli_list_string = $rOpts->{'brace-left-and-indent-list'};
7281 $bli_pattern = make_block_pattern( '-blil', $bli_list_string );
7284 sub make_block_brace_vertical_tightness_pattern {
7286 # turn any input list into a regex for recognizing selected block types
7287 $block_brace_vertical_tightness_pattern =
7288 '^((if|else|elsif|unless|while|for|foreach|do|\w+:)$|sub)';
7290 if ( defined( $rOpts->{'block-brace-vertical-tightness-list'} )
7291 && $rOpts->{'block-brace-vertical-tightness-list'} )
7293 $block_brace_vertical_tightness_pattern =
7294 make_block_pattern( '-bbvtl',
7295 $rOpts->{'block-brace-vertical-tightness-list'} );
7299 sub make_block_pattern {
7301 # given a string of block-type keywords, return a regex to match them
7302 # The only tricky part is that labels are indicated with a single ':'
7303 # and the 'sub' token text may have additional text after it (name of
7308 # input string: "if else elsif unless while for foreach do : sub";
7309 # pattern: '^((if|else|elsif|unless|while|for|foreach|do|\w+:)$|sub)';
7311 my ( $abbrev, $string ) = @_;
7312 my @list = split_words($string);
7318 if ( $i eq 'sub' ) {
7320 elsif ( $i eq ':' ) {
7321 push @words, '\w+:';
7323 elsif ( $i =~ /^\w/ ) {
7327 warn "unrecognized block type $i after $abbrev, ignoring\n";
7330 my $pattern = '(' . join( '|', @words ) . ')$';
7331 if ( $seen{'sub'} ) {
7332 $pattern = '(' . $pattern . '|sub)';
7334 $pattern = '^' . $pattern;
7338 sub make_static_side_comment_pattern {
7340 # create the pattern used to identify static side comments
7341 $static_side_comment_pattern = '^##';
7343 # allow the user to change it
7344 if ( $rOpts->{'static-side-comment-prefix'} ) {
7345 my $prefix = $rOpts->{'static-side-comment-prefix'};
7346 $prefix =~ s/^\s*//;
7347 my $pattern = '^' . $prefix;
7348 eval "'##'=~/$pattern/";
7351 "ERROR: the -sscp prefix '$prefix' causes the invalid regex '$pattern'\n";
7353 $static_side_comment_pattern = $pattern;
7357 sub make_closing_side_comment_prefix {
7359 # Be sure we have a valid closing side comment prefix
7360 my $csc_prefix = $rOpts->{'closing-side-comment-prefix'};
7361 my $csc_prefix_pattern;
7362 if ( !defined($csc_prefix) ) {
7363 $csc_prefix = '## end';
7364 $csc_prefix_pattern = '^##\s+end';
7367 my $test_csc_prefix = $csc_prefix;
7368 if ( $test_csc_prefix !~ /^#/ ) {
7369 $test_csc_prefix = '#' . $test_csc_prefix;
7372 # make a regex to recognize the prefix
7373 my $test_csc_prefix_pattern = $test_csc_prefix;
7375 # escape any special characters
7376 $test_csc_prefix_pattern =~ s/([^#\s\w])/\\$1/g;
7378 $test_csc_prefix_pattern = '^' . $test_csc_prefix_pattern;
7380 # allow exact number of intermediate spaces to vary
7381 $test_csc_prefix_pattern =~ s/\s+/\\s\+/g;
7383 # make sure we have a good pattern
7384 # if we fail this we probably have an error in escaping
7386 eval "'##'=~/$test_csc_prefix_pattern/";
7389 # shouldn't happen..must have screwed up escaping, above
7390 report_definite_bug();
7392 "Program Error: the -cscp prefix '$csc_prefix' caused the invalid regex '$csc_prefix_pattern'\n";
7394 # just warn and keep going with defaults
7395 warn "Please consider using a simpler -cscp prefix\n";
7396 warn "Using default -cscp instead; please check output\n";
7399 $csc_prefix = $test_csc_prefix;
7400 $csc_prefix_pattern = $test_csc_prefix_pattern;
7403 $rOpts->{'closing-side-comment-prefix'} = $csc_prefix;
7404 $closing_side_comment_prefix_pattern = $csc_prefix_pattern;
7407 sub dump_want_left_space {
7411 These values are the main control of whitespace to the left of a token type;
7412 They may be altered with the -wls parameter.
7413 For a list of token types, use perltidy --dump-token-types (-dtt)
7414 1 means the token wants a space to its left
7415 -1 means the token does not want a space to its left
7416 ------------------------------------------------------------------------
7418 foreach ( sort keys %want_left_space ) {
7419 print $fh "$_\t$want_left_space{$_}\n";
7423 sub dump_want_right_space {
7427 These values are the main control of whitespace to the right of a token type;
7428 They may be altered with the -wrs parameter.
7429 For a list of token types, use perltidy --dump-token-types (-dtt)
7430 1 means the token wants a space to its right
7431 -1 means the token does not want a space to its right
7432 ------------------------------------------------------------------------
7434 foreach ( sort keys %want_right_space ) {
7435 print $fh "$_\t$want_right_space{$_}\n";
7439 { # begin is_essential_whitespace
7441 my %is_sort_grep_map;
7446 @_ = qw(sort grep map);
7447 @is_sort_grep_map{@_} = (1) x scalar(@_);
7449 @_ = qw(for foreach);
7450 @is_for_foreach{@_} = (1) x scalar(@_);
7454 sub is_essential_whitespace {
7456 # Essential whitespace means whitespace which cannot be safely deleted
7457 # without risking the introduction of a syntax error.
7458 # We are given three tokens and their types:
7459 # ($tokenl, $typel) is the token to the left of the space in question
7460 # ($tokenr, $typer) is the token to the right of the space in question
7461 # ($tokenll, $typell) is previous nonblank token to the left of $tokenl
7463 # This is a slow routine but is not needed too often except when -mangle
7466 # Note: This routine should almost never need to be changed. It is
7467 # for avoiding syntax problems rather than for formatting.
7468 my ( $tokenll, $typell, $tokenl, $typel, $tokenr, $typer ) = @_;
7472 # never combine two bare words or numbers
7473 # examples: and ::ok(1)
7475 # for bla::bla:: abc
7476 # example is "%overload:: and" in files Dumpvalue.pm or colonbug.pl
7477 # $input eq"quit" to make $inputeq"quit"
7478 # my $size=-s::SINK if $file; <==OK but we won't do it
7479 # don't join something like: for bla::bla:: abc
7480 # example is "%overload:: and" in files Dumpvalue.pm or colonbug.pl
7481 ( ( $tokenl =~ /([\'\w]|\:\:)$/ ) && ( $tokenr =~ /^([\'\w]|\:\:)/ ) )
7483 # do not combine a number with a concatination dot
7484 # example: pom.caputo:
7485 # $vt100_compatible ? "\e[0;0H" : ('-' x 78 . "\n");
7486 || ( ( $typel eq 'n' ) && ( $tokenr eq '.' ) )
7487 || ( ( $typer eq 'n' ) && ( $tokenl eq '.' ) )
7489 # do not join a minus with a bare word, because you might form
7490 # a file test operator. Example from Complex.pm:
7491 # if (CORE::abs($z - i) < $eps); "z-i" would be taken as a file test.
7492 || ( ( $tokenl eq '-' ) && ( $tokenr =~ /^[_A-Za-z]$/ ) )
7494 # and something like this could become ambiguous without space
7496 # use constant III=>1;
7500 || ( ( $tokenl eq '-' )
7501 && ( $typer =~ /^[wC]$/ && $tokenr =~ /^[_A-Za-z]/ ) )
7503 # '= -' should not become =- or you will get a warning
7505 # || ($tokenr eq '-')
7507 # keep a space between a quote and a bareword to prevent the
7508 # bareword from becomming a quote modifier.
7509 || ( ( $typel eq 'Q' ) && ( $tokenr =~ /^[a-zA-Z_]/ ) )
7511 # keep a space between a token ending in '$' and any word;
7512 # this caused trouble: "die @$ if $@"
7513 || ( ( $typel eq 'i' && $tokenl =~ /\$$/ )
7514 && ( $tokenr =~ /^[a-zA-Z_]/ ) )
7516 # perl is very fussy about spaces before <<
7517 || ( $tokenr =~ /^\<\</ )
7519 # avoid combining tokens to create new meanings. Example:
7520 # $a+ +$b must not become $a++$b
7521 || ( $is_digraph{ $tokenl . $tokenr } )
7522 || ( $is_trigraph{ $tokenl . $tokenr } )
7524 # another example: do not combine these two &'s:
7525 # allow_options & &OPT_EXECCGI
7526 || ( $is_digraph{ $tokenl . substr( $tokenr, 0, 1 ) } )
7528 # don't combine $$ or $# with any alphanumeric
7529 # (testfile mangle.t with --mangle)
7530 || ( ( $tokenl =~ /^\$[\$\#]$/ ) && ( $tokenr =~ /^\w/ ) )
7532 # retain any space after possible filehandle
7533 # (testfiles prnterr1.t with --extrude and mangle.t with --mangle)
7534 || ( $typel eq 'Z' )
7536 # Perl is sensitive to whitespace after the + here:
7537 # $b = xvals $a + 0.1 * yvals $a;
7538 || ( $typell eq 'Z' && $typel =~ /^[\/\?\+\-\*]$/ )
7540 # keep paren separate in 'use Foo::Bar ()'
7544 && $tokenll eq 'use' )
7546 # keep any space between filehandle and paren:
7547 # file mangle.t with --mangle:
7548 || ( $typel eq 'Y' && $tokenr eq '(' )
7550 # retain any space after here doc operator ( hereerr.t)
7551 || ( $typel eq 'h' )
7553 # FIXME: this needs some further work; extrude.t has test cases
7554 # it is safest to retain any space after start of ? : operator
7555 # because of perl's quirky parser.
7556 # ie, this line will fail if you remove the space after the '?':
7557 # $b=join $comma ? ',' : ':', @_; # ok
7558 # $b=join $comma ?',' : ':', @_; # error!
7560 # $b=join $comma?',' : ':', @_; # not a problem!
7561 ## || ($typel eq '?')
7563 # be careful with a space around ++ and --, to avoid ambiguity as to
7564 # which token it applies
7565 || ( ( $typer =~ /^(pp|mm)$/ ) && ( $tokenl !~ /^[\;\{\(\[]/ ) )
7566 || ( ( $typel =~ /^(\+\+|\-\-)$/ ) && ( $tokenr !~ /^[\;\}\)\]]/ ) )
7568 # need space after foreach my; for example, this will fail in
7569 # older versions of Perl:
7570 # foreach my$ft(@filetypes)...
7575 && $is_for_foreach{$tokenll}
7579 # must have space between grep and left paren; "grep(" will fail
7580 || ( $tokenr eq '(' && $is_sort_grep_map{$tokenl} )
7582 # don't stick numbers next to left parens, as in:
7583 #use Mail::Internet 1.28 (); (see Entity.pm, Head.pm, Test.pm)
7584 || ( ( $typel eq 'n' ) && ( $tokenr eq '(' ) )
7586 ; # the value of this long logic sequence is the result we want
7591 sub set_white_space_flag {
7593 # This routine examines each pair of nonblank tokens and
7594 # sets values for array @white_space_flag.
7596 # $white_space_flag[$j] is a flag indicating whether a white space
7597 # BEFORE token $j is needed, with the following values:
7599 # -1 do not want a space before token $j
7600 # 0 optional space or $j is a whitespace
7601 # 1 want a space before token $j
7604 # The values for the first token will be defined based
7605 # upon the contents of the "to_go" output array.
7607 # Note: retain debug print statements because they are usually
7608 # required after adding new token types.
7612 # initialize these global hashes, which control the use of
7613 # whitespace around tokens:
7618 # %space_after_keyword
7620 # Many token types are identical to the tokens themselves.
7621 # See the tokenizer for a complete list. Here are some special types:
7623 # f = semicolon in for statement
7626 # Note that :: is excluded since it should be contained in an identifier
7627 # Note that '->' is excluded because it never gets space
7628 # parentheses and brackets are excluded since they are handled specially
7629 # curly braces are included but may be overridden by logic, such as
7632 # NEW_TOKENS: create a whitespace rule here. This can be as
7633 # simple as adding your new letter to @spaces_both_sides, for
7637 @is_opening_type{@_} = (1) x scalar(@_);
7640 @is_closing_type{@_} = (1) x scalar(@_);
7642 my @spaces_both_sides = qw"
7643 + - * / % ? = . : x < > | & ^ .. << >> ** && .. || // => += -=
7644 .= %= x= &= |= ^= *= <> <= >= == =~ !~ /= != ... <<= >>= ~~ !~~
7645 &&= ||= //= <=> A k f w F n C Y U G v
7648 my @spaces_left_side = qw"
7649 t ! ~ m p { \ h pp mm Z j
7651 push( @spaces_left_side, '#' ); # avoids warning message
7653 my @spaces_right_side = qw"
7654 ; } ) ] R J ++ -- **=
7656 push( @spaces_right_side, ',' ); # avoids warning message
7657 @want_left_space{@spaces_both_sides} = (1) x scalar(@spaces_both_sides);
7658 @want_right_space{@spaces_both_sides} =
7659 (1) x scalar(@spaces_both_sides);
7660 @want_left_space{@spaces_left_side} = (1) x scalar(@spaces_left_side);
7661 @want_right_space{@spaces_left_side} = (-1) x scalar(@spaces_left_side);
7662 @want_left_space{@spaces_right_side} =
7663 (-1) x scalar(@spaces_right_side);
7664 @want_right_space{@spaces_right_side} =
7665 (1) x scalar(@spaces_right_side);
7666 $want_left_space{'L'} = WS_NO;
7667 $want_left_space{'->'} = WS_NO;
7668 $want_right_space{'->'} = WS_NO;
7669 $want_left_space{'**'} = WS_NO;
7670 $want_right_space{'**'} = WS_NO;
7672 # hash type information must stay tightly bound
7674 $binary_ws_rules{'i'}{'L'} = WS_NO;
7675 $binary_ws_rules{'i'}{'{'} = WS_YES;
7676 $binary_ws_rules{'k'}{'{'} = WS_YES;
7677 $binary_ws_rules{'U'}{'{'} = WS_YES;
7678 $binary_ws_rules{'i'}{'['} = WS_NO;
7679 $binary_ws_rules{'R'}{'L'} = WS_NO;
7680 $binary_ws_rules{'R'}{'{'} = WS_NO;
7681 $binary_ws_rules{'t'}{'L'} = WS_NO;
7682 $binary_ws_rules{'t'}{'{'} = WS_NO;
7683 $binary_ws_rules{'}'}{'L'} = WS_NO;
7684 $binary_ws_rules{'}'}{'{'} = WS_NO;
7685 $binary_ws_rules{'$'}{'L'} = WS_NO;
7686 $binary_ws_rules{'$'}{'{'} = WS_NO;
7687 $binary_ws_rules{'@'}{'L'} = WS_NO;
7688 $binary_ws_rules{'@'}{'{'} = WS_NO;
7689 $binary_ws_rules{'='}{'L'} = WS_YES;
7691 # the following includes ') {'
7692 # as in : if ( xxx ) { yyy }
7693 $binary_ws_rules{']'}{'L'} = WS_NO;
7694 $binary_ws_rules{']'}{'{'} = WS_NO;
7695 $binary_ws_rules{')'}{'{'} = WS_YES;
7696 $binary_ws_rules{')'}{'['} = WS_NO;
7697 $binary_ws_rules{']'}{'['} = WS_NO;
7698 $binary_ws_rules{']'}{'{'} = WS_NO;
7699 $binary_ws_rules{'}'}{'['} = WS_NO;
7700 $binary_ws_rules{'R'}{'['} = WS_NO;
7702 $binary_ws_rules{']'}{'++'} = WS_NO;
7703 $binary_ws_rules{']'}{'--'} = WS_NO;
7704 $binary_ws_rules{')'}{'++'} = WS_NO;
7705 $binary_ws_rules{')'}{'--'} = WS_NO;
7707 $binary_ws_rules{'R'}{'++'} = WS_NO;
7708 $binary_ws_rules{'R'}{'--'} = WS_NO;
7710 ########################################################
7711 # should no longer be necessary (see niek.pl)
7712 ##$binary_ws_rules{'k'}{':'} = WS_NO; # keep colon with label
7713 ##$binary_ws_rules{'w'}{':'} = WS_NO;
7714 ########################################################
7715 $binary_ws_rules{'i'}{'Q'} = WS_YES;
7716 $binary_ws_rules{'n'}{'('} = WS_YES; # occurs in 'use package n ()'
7718 # FIXME: we need to split 'i' into variables and functions
7719 # and have no space for functions but space for variables. For now,
7720 # I have a special patch in the special rules below
7721 $binary_ws_rules{'i'}{'('} = WS_NO;
7723 $binary_ws_rules{'w'}{'('} = WS_NO;
7724 $binary_ws_rules{'w'}{'{'} = WS_YES;
7726 my ( $jmax, $rtokens, $rtoken_type, $rblock_type ) = @_;
7727 my ( $last_token, $last_type, $last_block_type, $token, $type,
7729 my (@white_space_flag);
7730 my $j_tight_closing_paren = -1;
7732 if ( $max_index_to_go >= 0 ) {
7733 $token = $tokens_to_go[$max_index_to_go];
7734 $type = $types_to_go[$max_index_to_go];
7735 $block_type = $block_type_to_go[$max_index_to_go];
7743 # loop over all tokens
7746 for ( $j = 0 ; $j <= $jmax ; $j++ ) {
7748 if ( $$rtoken_type[$j] eq 'b' ) {
7749 $white_space_flag[$j] = WS_OPTIONAL;
7753 # set a default value, to be changed as needed
7755 $last_token = $token;
7757 $last_block_type = $block_type;
7758 $token = $$rtokens[$j];
7759 $type = $$rtoken_type[$j];
7760 $block_type = $$rblock_type[$j];
7762 #---------------------------------------------------------------
7764 # handle space on the inside of opening braces
7765 #---------------------------------------------------------------
7768 if ( $is_opening_type{$last_type} ) {
7770 $j_tight_closing_paren = -1;
7772 # let's keep empty matched braces together: () {} []
7774 if ( $token eq $matching_token{$last_token} ) {
7784 # we're considering the right of an opening brace
7785 # tightness = 0 means always pad inside with space
7786 # tightness = 1 means pad inside if "complex"
7787 # tightness = 2 means never pad inside with space
7790 if ( $last_type eq '{'
7791 && $last_token eq '{'
7792 && $last_block_type )
7794 $tightness = $rOpts_block_brace_tightness;
7796 else { $tightness = $tightness{$last_token} }
7798 if ( $tightness <= 0 ) {
7801 elsif ( $tightness > 1 ) {
7806 # Patch to count '-foo' as single token so that
7807 # each of $a{-foo} and $a{foo} and $a{'foo'} do
7808 # not get spaces with default formatting.
7812 && $last_token eq '{'
7813 && $$rtoken_type[ $j + 1 ] eq 'w' );
7815 # $j_next is where a closing token should be if
7816 # the container has a single token
7818 ( $$rtoken_type[ $j_here + 1 ] eq 'b' )
7821 my $tok_next = $$rtokens[$j_next];
7822 my $type_next = $$rtoken_type[$j_next];
7824 # for tightness = 1, if there is just one token
7825 # within the matching pair, we will keep it tight
7827 $tok_next eq $matching_token{$last_token}
7829 # but watch out for this: [ [ ] (misc.t)
7830 && $last_token ne $token
7834 # remember where to put the space for the closing paren
7835 $j_tight_closing_paren = $j_next;
7843 } # done with opening braces and brackets
7845 if FORMATTER_DEBUG_FLAG_WHITE;
7847 #---------------------------------------------------------------
7849 # handle space on inside of closing brace pairs
7850 #---------------------------------------------------------------
7853 if ( $is_closing_type{$type} ) {
7855 if ( $j == $j_tight_closing_paren ) {
7857 $j_tight_closing_paren = -1;
7862 if ( !defined($ws) ) {
7865 if ( $type eq '}' && $token eq '}' && $block_type ) {
7866 $tightness = $rOpts_block_brace_tightness;
7868 else { $tightness = $tightness{$token} }
7870 $ws = ( $tightness > 1 ) ? WS_NO : WS_YES;
7876 if FORMATTER_DEBUG_FLAG_WHITE;
7878 #---------------------------------------------------------------
7880 # use the binary table
7881 #---------------------------------------------------------------
7882 if ( !defined($ws) ) {
7883 $ws = $binary_ws_rules{$last_type}{$type};
7886 if FORMATTER_DEBUG_FLAG_WHITE;
7888 #---------------------------------------------------------------
7890 # some special cases
7891 #---------------------------------------------------------------
7892 if ( $token eq '(' ) {
7894 # This will have to be tweaked as tokenization changes.
7895 # We usually want a space at '} (', for example:
7896 # map { 1 * $_; } ( $y, $M, $w, $d, $h, $m, $s );
7899 # &{ $_->[1] }( delete $_[$#_]{ $_->[0] } );
7900 # At present, the above & block is marked as type L/R so this case
7901 # won't go through here.
7902 if ( $last_type eq '}' ) { $ws = WS_YES }
7904 # NOTE: some older versions of Perl had occasional problems if
7905 # spaces are introduced between keywords or functions and opening
7906 # parens. So the default is not to do this except is certain
7907 # cases. The current Perl seems to tolerate spaces.
7909 # Space between keyword and '('
7910 elsif ( $last_type eq 'k' ) {
7912 unless ( $rOpts_space_keyword_paren
7913 || $space_after_keyword{$last_token} );
7916 # Space between function and '('
7917 # -----------------------------------------------------
7918 # 'w' and 'i' checks for something like:
7919 # myfun( &myfun( ->myfun(
7920 # -----------------------------------------------------
7921 elsif (( $last_type =~ /^[wU]$/ )
7922 || ( $last_type =~ /^[wi]$/ && $last_token =~ /^(\&|->)/ ) )
7924 $ws = WS_NO unless ($rOpts_space_function_paren);
7927 # space between something like $i and ( in
7928 # for $i ( 0 .. 20 ) {
7929 # FIXME: eventually, type 'i' needs to be split into multiple
7930 # token types so this can be a hardwired rule.
7931 elsif ( $last_type eq 'i' && $last_token =~ /^[\$\%\@]/ ) {
7935 # allow constant function followed by '()' to retain no space
7936 elsif ( $last_type eq 'C' && $$rtokens[ $j + 1 ] eq ')' ) {
7941 # patch for SWITCH/CASE: make space at ']{' optional
7942 # since the '{' might begin a case or when block
7943 elsif ( ( $token eq '{' && $type ne 'L' ) && $last_token eq ']' ) {
7947 # keep space between 'sub' and '{' for anonymous sub definition
7948 if ( $type eq '{' ) {
7949 if ( $last_token eq 'sub' ) {
7953 # this is needed to avoid no space in '){'
7954 if ( $last_token eq ')' && $token eq '{' ) { $ws = WS_YES }
7956 # avoid any space before the brace or bracket in something like
7957 # @opts{'a','b',...}
7958 if ( $last_type eq 'i' && $last_token =~ /^\@/ ) {
7963 elsif ( $type eq 'i' ) {
7965 # never a space before ->
7966 if ( $token =~ /^\-\>/ ) {
7971 # retain any space between '-' and bare word
7972 elsif ( $type eq 'w' || $type eq 'C' ) {
7973 $ws = WS_OPTIONAL if $last_type eq '-';
7975 # never a space before ->
7976 if ( $token =~ /^\-\>/ ) {
7981 # retain any space between '-' and bare word
7982 # example: avoid space between 'USER' and '-' here:
7983 # $myhash{USER-NAME}='steve';
7984 elsif ( $type eq 'm' || $type eq '-' ) {
7985 $ws = WS_OPTIONAL if ( $last_type eq 'w' );
7988 # always space before side comment
7989 elsif ( $type eq '#' ) { $ws = WS_YES if $j > 0 }
7991 # always preserver whatever space was used after a possible
7992 # filehandle (except _) or here doc operator
7995 && ( ( $last_type eq 'Z' && $last_token ne '_' )
7996 || $last_type eq 'h' )
8003 if FORMATTER_DEBUG_FLAG_WHITE;
8005 #---------------------------------------------------------------
8007 # default rules not covered above
8008 #---------------------------------------------------------------
8009 # if we fall through to here,
8010 # look at the pre-defined hash tables for the two tokens, and
8011 # if (they are equal) use the common value
8012 # if (either is zero or undef) use the other
8013 # if (either is -1) use it
8027 if ( !defined($ws) ) {
8028 my $wl = $want_left_space{$type};
8029 my $wr = $want_right_space{$last_type};
8030 if ( !defined($wl) ) { $wl = 0 }
8031 if ( !defined($wr) ) { $wr = 0 }
8032 $ws = ( ( $wl == $wr ) || ( $wl == -1 ) || !$wr ) ? $wl : $wr;
8035 if ( !defined($ws) ) {
8038 "WS flag is undefined for tokens $last_token $token\n");
8041 # Treat newline as a whitespace. Otherwise, we might combine
8042 # 'Send' and '-recipients' here according to the above rules:
8043 # my $msg = new Fax::Send
8044 # -recipients => $to,
8046 if ( $ws == 0 && $j == 0 ) { $ws = 1 }
8051 && ( $last_type !~ /^[Zh]$/ ) )
8054 # If this happens, we have a non-fatal but undesirable
8055 # hole in the above rules which should be patched.
8057 "WS flag is zero for tokens $last_token $token\n");
8059 $white_space_flag[$j] = $ws;
8061 FORMATTER_DEBUG_FLAG_WHITE && do {
8062 my $str = substr( $last_token, 0, 15 );
8063 $str .= ' ' x ( 16 - length($str) );
8064 if ( !defined($ws_1) ) { $ws_1 = "*" }
8065 if ( !defined($ws_2) ) { $ws_2 = "*" }
8066 if ( !defined($ws_3) ) { $ws_3 = "*" }
8067 if ( !defined($ws_4) ) { $ws_4 = "*" }
8069 "WHITE: i=$j $str $last_type $type $ws_1 : $ws_2 : $ws_3 : $ws_4 : $ws \n";
8072 return \@white_space_flag;
8075 { # begin print_line_of_tokens
8082 my $rcontainer_type;
8083 my $rcontainer_environment;
8086 my $rnesting_tokens;
8088 my $rnesting_blocks;
8091 my $python_indentation_level;
8093 # These local token variables are stored by store_token_to_go:
8096 my $container_environment;
8098 my $in_continued_quote;
8101 my $no_internal_newlines;
8107 # routine to pull the jth token from the line of tokens
8110 $token = $$rtokens[$j];
8111 $type = $$rtoken_type[$j];
8112 $block_type = $$rblock_type[$j];
8113 $container_type = $$rcontainer_type[$j];
8114 $container_environment = $$rcontainer_environment[$j];
8115 $type_sequence = $$rtype_sequence[$j];
8116 $level = $$rlevels[$j];
8117 $slevel = $$rslevels[$j];
8118 $nesting_blocks = $$rnesting_blocks[$j];
8119 $ci_level = $$rci_levels[$j];
8125 sub save_current_token {
8128 $block_type, $ci_level,
8129 $container_environment, $container_type,
8130 $in_continued_quote, $level,
8131 $nesting_blocks, $no_internal_newlines,
8133 $type, $type_sequence,
8137 sub restore_current_token {
8139 $block_type, $ci_level,
8140 $container_environment, $container_type,
8141 $in_continued_quote, $level,
8142 $nesting_blocks, $no_internal_newlines,
8144 $type, $type_sequence,
8149 # Routine to place the current token into the output stream.
8150 # Called once per output token.
8151 sub store_token_to_go {
8153 my $flag = $no_internal_newlines;
8154 if ( $_[0] ) { $flag = 1 }
8156 $tokens_to_go[ ++$max_index_to_go ] = $token;
8157 $types_to_go[$max_index_to_go] = $type;
8158 $nobreak_to_go[$max_index_to_go] = $flag;
8159 $old_breakpoint_to_go[$max_index_to_go] = 0;
8160 $forced_breakpoint_to_go[$max_index_to_go] = 0;
8161 $block_type_to_go[$max_index_to_go] = $block_type;
8162 $type_sequence_to_go[$max_index_to_go] = $type_sequence;
8163 $container_environment_to_go[$max_index_to_go] = $container_environment;
8164 $nesting_blocks_to_go[$max_index_to_go] = $nesting_blocks;
8165 $ci_levels_to_go[$max_index_to_go] = $ci_level;
8166 $mate_index_to_go[$max_index_to_go] = -1;
8167 $matching_token_to_go[$max_index_to_go] = '';
8169 # Note: negative levels are currently retained as a diagnostic so that
8170 # the 'final indentation level' is correctly reported for bad scripts.
8171 # But this means that every use of $level as an index must be checked.
8172 # If this becomes too much of a problem, we might give up and just clip
8174 ## $levels_to_go[$max_index_to_go] = ( $level > 0 ) ? $level : 0;
8175 $levels_to_go[$max_index_to_go] = $level;
8176 $nesting_depth_to_go[$max_index_to_go] = ( $slevel >= 0 ) ? $slevel : 0;
8177 $lengths_to_go[ $max_index_to_go + 1 ] =
8178 $lengths_to_go[$max_index_to_go] + length($token);
8180 # Define the indentation that this token would have if it started
8181 # a new line. We have to do this now because we need to know this
8182 # when considering one-line blocks.
8183 set_leading_whitespace( $level, $ci_level, $in_continued_quote );
8185 if ( $type ne 'b' ) {
8186 $last_last_nonblank_index_to_go = $last_nonblank_index_to_go;
8187 $last_last_nonblank_type_to_go = $last_nonblank_type_to_go;
8188 $last_last_nonblank_token_to_go = $last_nonblank_token_to_go;
8189 $last_nonblank_index_to_go = $max_index_to_go;
8190 $last_nonblank_type_to_go = $type;
8191 $last_nonblank_token_to_go = $token;
8192 if ( $type eq ',' ) {
8193 $comma_count_in_batch++;
8197 FORMATTER_DEBUG_FLAG_STORE && do {
8198 my ( $a, $b, $c ) = caller();
8200 "STORE: from $a $c: storing token $token type $type lev=$level slev=$slevel at $max_index_to_go\n";
8204 sub insert_new_token_to_go {
8206 # insert a new token into the output stream. use same level as
8207 # previous token; assumes a character at max_index_to_go.
8208 save_current_token();
8209 ( $token, $type, $slevel, $no_internal_newlines ) = @_;
8211 if ( $max_index_to_go == UNDEFINED_INDEX ) {
8212 warning("code bug: bad call to insert_new_token_to_go\n");
8214 $level = $levels_to_go[$max_index_to_go];
8216 # FIXME: it seems to be necessary to use the next, rather than
8217 # previous, value of this variable when creating a new blank (align.t)
8218 #my $slevel = $nesting_depth_to_go[$max_index_to_go];
8219 $nesting_blocks = $nesting_blocks_to_go[$max_index_to_go];
8220 $ci_level = $ci_levels_to_go[$max_index_to_go];
8221 $container_environment = $container_environment_to_go[$max_index_to_go];
8222 $in_continued_quote = 0;
8224 $type_sequence = "";
8225 store_token_to_go();
8226 restore_current_token();
8230 sub print_line_of_tokens {
8232 my $line_of_tokens = shift;
8234 # This routine is called once per input line to process all of
8235 # the tokens on that line. This is the first stage of
8238 # Full-line comments and blank lines may be processed immediately.
8240 # For normal lines of code, the tokens are stored one-by-one,
8241 # via calls to 'sub store_token_to_go', until a known line break
8242 # point is reached. Then, the batch of collected tokens is
8243 # passed along to 'sub output_line_to_go' for further
8244 # processing. This routine decides if there should be
8245 # whitespace between each pair of non-white tokens, so later
8246 # routines only need to decide on any additional line breaks.
8247 # Any whitespace is initally a single space character. Later,
8248 # the vertical aligner may expand that to be multiple space
8249 # characters if necessary for alignment.
8251 # extract input line number for error messages
8252 $input_line_number = $line_of_tokens->{_line_number};
8254 $rtoken_type = $line_of_tokens->{_rtoken_type};
8255 $rtokens = $line_of_tokens->{_rtokens};
8256 $rlevels = $line_of_tokens->{_rlevels};
8257 $rslevels = $line_of_tokens->{_rslevels};
8258 $rblock_type = $line_of_tokens->{_rblock_type};
8259 $rcontainer_type = $line_of_tokens->{_rcontainer_type};
8260 $rcontainer_environment = $line_of_tokens->{_rcontainer_environment};
8261 $rtype_sequence = $line_of_tokens->{_rtype_sequence};
8262 $input_line = $line_of_tokens->{_line_text};
8263 $rnesting_tokens = $line_of_tokens->{_rnesting_tokens};
8264 $rci_levels = $line_of_tokens->{_rci_levels};
8265 $rnesting_blocks = $line_of_tokens->{_rnesting_blocks};
8267 $in_continued_quote = $starting_in_quote =
8268 $line_of_tokens->{_starting_in_quote};
8269 $in_quote = $line_of_tokens->{_ending_in_quote};
8270 $ending_in_quote = $in_quote;
8271 $python_indentation_level =
8272 $line_of_tokens->{_python_indentation_level};
8277 my $next_nonblank_token;
8278 my $next_nonblank_token_type;
8279 my $rwhite_space_flag;
8281 $jmax = @$rtokens - 1;
8283 $container_type = "";
8284 $container_environment = "";
8285 $type_sequence = "";
8286 $no_internal_newlines = 1 - $rOpts_add_newlines;
8287 $is_static_block_comment = 0;
8289 # Handle a continued quote..
8290 if ($in_continued_quote) {
8292 # A line which is entirely a quote or pattern must go out
8293 # verbatim. Note: the \n is contained in $input_line.
8295 if ( ( $input_line =~ "\t" ) ) {
8296 note_embedded_tab();
8298 write_unindented_line("$input_line");
8299 $last_line_had_side_comment = 0;
8303 # prior to version 20010406, perltidy had a bug which placed
8304 # continuation indentation before the last line of some multiline
8305 # quotes and patterns -- exactly the lines passing this way.
8306 # To help find affected lines in scripts run with these
8307 # versions, run with '-chk', and it will warn of any quotes or
8308 # patterns which might have been modified by these early
8310 if ( $rOpts->{'check-multiline-quotes'} && $input_line =~ /^ / ) {
8312 "-chk: please check this line for extra leading whitespace\n"
8317 # Write line verbatim if we are in a formatting skip section
8318 if ($in_format_skipping_section) {
8319 write_unindented_line("$input_line");
8320 $last_line_had_side_comment = 0;
8322 # Note: extra space appended to comment simplifies pattern matching
8324 && $$rtoken_type[0] eq '#'
8325 && ( $$rtokens[0] . " " ) =~ /$format_skipping_pattern_end/o )
8327 $in_format_skipping_section = 0;
8328 write_logfile_entry("Exiting formatting skip section\n");
8333 # See if we are entering a formatting skip section
8334 if ( $rOpts_format_skipping
8336 && $$rtoken_type[0] eq '#'
8337 && ( $$rtokens[0] . " " ) =~ /$format_skipping_pattern_begin/o )
8340 $in_format_skipping_section = 1;
8341 write_logfile_entry("Entering formatting skip section\n");
8342 write_unindented_line("$input_line");
8343 $last_line_had_side_comment = 0;
8347 # delete trailing blank tokens
8348 if ( $jmax > 0 && $$rtoken_type[$jmax] eq 'b' ) { $jmax-- }
8350 # Handle a blank line..
8353 # For the 'swallow-optional-blank-lines' option, we delete all
8354 # old blank lines and let the blank line rules generate any
8356 if ( !$rOpts_swallow_optional_blank_lines ) {
8358 $file_writer_object->write_blank_code_line();
8359 $last_line_leading_type = 'b';
8361 $last_line_had_side_comment = 0;
8365 # see if this is a static block comment (starts with ## by default)
8366 my $is_static_block_comment_without_leading_space = 0;
8368 && $$rtoken_type[0] eq '#'
8369 && $rOpts->{'static-block-comments'}
8370 && $input_line =~ /$static_block_comment_pattern/o )
8372 $is_static_block_comment = 1;
8373 $is_static_block_comment_without_leading_space =
8374 substr( $input_line, 0, 1 ) eq '#';
8377 # create a hanging side comment if appropriate
8380 && $$rtoken_type[0] eq '#' # only token is a comment
8381 && $last_line_had_side_comment # last line had side comment
8382 && $input_line =~ /^\s/ # there is some leading space
8383 && !$is_static_block_comment # do not make static comment hanging
8384 && $rOpts->{'hanging-side-comments'} # user is allowing this
8388 # We will insert an empty qw string at the start of the token list
8389 # to force this comment to be a side comment. The vertical aligner
8390 # should then line it up with the previous side comment.
8391 unshift @$rtoken_type, 'q';
8392 unshift @$rtokens, '';
8393 unshift @$rlevels, $$rlevels[0];
8394 unshift @$rslevels, $$rslevels[0];
8395 unshift @$rblock_type, '';
8396 unshift @$rcontainer_type, '';
8397 unshift @$rcontainer_environment, '';
8398 unshift @$rtype_sequence, '';
8399 unshift @$rnesting_tokens, $$rnesting_tokens[0];
8400 unshift @$rci_levels, $$rci_levels[0];
8401 unshift @$rnesting_blocks, $$rnesting_blocks[0];
8405 # remember if this line has a side comment
8406 $last_line_had_side_comment =
8407 ( $jmax > 0 && $$rtoken_type[$jmax] eq '#' );
8409 # Handle a block (full-line) comment..
8410 if ( ( $jmax == 0 ) && ( $$rtoken_type[0] eq '#' ) ) {
8412 if ( $rOpts->{'delete-block-comments'} ) { return }
8414 if ( $rOpts->{'tee-block-comments'} ) {
8415 $file_writer_object->tee_on();
8418 destroy_one_line_block();
8419 output_line_to_go();
8421 # output a blank line before block comments
8423 $last_line_leading_type !~ /^[#b]$/
8424 && $rOpts->{'blanks-before-comments'} # only if allowed
8426 $is_static_block_comment # never before static block comments
8429 flush(); # switching to new output stream
8430 $file_writer_object->write_blank_code_line();
8431 $last_line_leading_type = 'b';
8434 # TRIM COMMENTS -- This could be turned off as a option
8435 $$rtokens[0] =~ s/\s*$//; # trim right end
8438 $rOpts->{'indent-block-comments'}
8439 && ( !$rOpts->{'indent-spaced-block-comments'}
8440 || $input_line =~ /^\s+/ )
8441 && !$is_static_block_comment_without_leading_space
8445 store_token_to_go();
8446 output_line_to_go();
8449 flush(); # switching to new output stream
8450 $file_writer_object->write_code_line( $$rtokens[0] . "\n" );
8451 $last_line_leading_type = '#';
8453 if ( $rOpts->{'tee-block-comments'} ) {
8454 $file_writer_object->tee_off();
8459 # compare input/output indentation except for continuation lines
8460 # (because they have an unknown amount of initial blank space)
8461 # and lines which are quotes (because they may have been outdented)
8462 # Note: this test is placed here because we know the continuation flag
8463 # at this point, which allows us to avoid non-meaningful checks.
8464 my $structural_indentation_level = $$rlevels[0];
8465 compare_indentation_levels( $python_indentation_level,
8466 $structural_indentation_level )
8467 unless ( $python_indentation_level < 0
8468 || ( $$rci_levels[0] > 0 )
8469 || ( ( $python_indentation_level == 0 ) && $$rtoken_type[0] eq 'Q' )
8472 # Patch needed for MakeMaker. Do not break a statement
8473 # in which $VERSION may be calculated. See MakeMaker.pm;
8474 # this is based on the coding in it.
8475 # The first line of a file that matches this will be eval'd:
8476 # /([\$*])(([\w\:\']*)\bVERSION)\b.*\=/
8478 # *VERSION = \'1.01';
8479 # ( $VERSION ) = '$Revision: 1.61 $ ' =~ /\$Revision:\s+([^\s]+)/;
8480 # We will pass such a line straight through without breaking
8481 # it unless -npvl is used
8483 my $is_VERSION_statement = 0;
8486 !$saw_VERSION_in_this_file
8487 && $input_line =~ /VERSION/ # quick check to reject most lines
8488 && $input_line =~ /([\$*])(([\w\:\']*)\bVERSION)\b.*\=/
8491 $saw_VERSION_in_this_file = 1;
8492 $is_VERSION_statement = 1;
8493 write_logfile_entry("passing VERSION line; -npvl deactivates\n");
8494 $no_internal_newlines = 1;
8497 # take care of indentation-only
8498 # NOTE: In previous versions we sent all qw lines out immediately here.
8499 # No longer doing this: also write a line which is entirely a 'qw' list
8500 # to allow stacking of opening and closing tokens. Note that interior
8501 # qw lines will still go out at the end of this routine.
8502 if ( $rOpts->{'indent-only'} ) {
8507 $token = $input_line;
8510 $container_type = "";
8511 $container_environment = "";
8512 $type_sequence = "";
8513 store_token_to_go();
8514 output_line_to_go();
8518 push( @$rtokens, ' ', ' ' ); # making $j+2 valid simplifies coding
8519 push( @$rtoken_type, 'b', 'b' );
8520 ($rwhite_space_flag) =
8521 set_white_space_flag( $jmax, $rtokens, $rtoken_type, $rblock_type );
8523 # find input tabbing to allow checks for tabbing disagreement
8525 ##$input_line_tabbing = "";
8526 ##if ( $input_line =~ /^(\s*)/ ) { $input_line_tabbing = $1; }
8528 # if the buffer hasn't been flushed, add a leading space if
8529 # necessary to keep essential whitespace. This is really only
8530 # necessary if we are squeezing out all ws.
8531 if ( $max_index_to_go >= 0 ) {
8533 $old_line_count_in_batch++;
8536 is_essential_whitespace(
8537 $last_last_nonblank_token,
8538 $last_last_nonblank_type,
8539 $tokens_to_go[$max_index_to_go],
8540 $types_to_go[$max_index_to_go],
8546 my $slevel = $$rslevels[0];
8547 insert_new_token_to_go( ' ', 'b', $slevel,
8548 $no_internal_newlines );
8552 # If we just saw the end of an elsif block, write nag message
8553 # if we do not see another elseif or an else.
8554 if ($looking_for_else) {
8556 unless ( $$rtokens[0] =~ /^(elsif|else)$/ ) {
8557 write_logfile_entry("(No else block)\n");
8559 $looking_for_else = 0;
8562 # This is a good place to kill incomplete one-line blocks
8563 if ( ( $semicolons_before_block_self_destruct == 0 )
8564 && ( $max_index_to_go >= 0 )
8565 && ( $types_to_go[$max_index_to_go] eq ';' )
8566 && ( $$rtokens[0] ne '}' ) )
8568 destroy_one_line_block();
8569 output_line_to_go();
8572 # loop to process the tokens one-by-one
8576 foreach $j ( 0 .. $jmax ) {
8578 # pull out the local values for this token
8581 if ( $type eq '#' ) {
8583 # trim trailing whitespace
8584 # (there is no option at present to prevent this)
8588 $rOpts->{'delete-side-comments'}
8590 # delete closing side comments if necessary
8591 || ( $rOpts->{'delete-closing-side-comments'}
8592 && $token =~ /$closing_side_comment_prefix_pattern/o
8593 && $last_nonblank_block_type =~
8594 /$closing_side_comment_list_pattern/o )
8597 if ( $types_to_go[$max_index_to_go] eq 'b' ) {
8598 unstore_token_to_go();
8604 # If we are continuing after seeing a right curly brace, flush
8605 # buffer unless we see what we are looking for, as in
8607 if ( $rbrace_follower && $type ne 'b' ) {
8609 unless ( $rbrace_follower->{$token} ) {
8610 output_line_to_go();
8612 $rbrace_follower = undef;
8615 $j_next = ( $$rtoken_type[ $j + 1 ] eq 'b' ) ? $j + 2 : $j + 1;
8616 $next_nonblank_token = $$rtokens[$j_next];
8617 $next_nonblank_token_type = $$rtoken_type[$j_next];
8619 #--------------------------------------------------------
8620 # Start of section to patch token text
8621 #--------------------------------------------------------
8623 # Modify certain tokens here for whitespace
8624 # The following is not yet done, but could be:
8626 if ( $type =~ /^[wit]$/ ) {
8629 # change '$ var' to '$var' etc
8630 # '-> new' to '->new'
8631 if ( $token =~ /^([\$\&\%\*\@]|\-\>)\s/ ) {
8635 if ( $token =~ /^sub/ ) { $token =~ s/\s+/ /g }
8638 # change 'LABEL :' to 'LABEL:'
8639 elsif ( $type eq 'J' ) { $token =~ s/\s+//g }
8641 # patch to add space to something like "x10"
8642 # This avoids having to split this token in the pre-tokenizer
8643 elsif ( $type eq 'n' ) {
8644 if ( $token =~ /^x\d+/ ) { $token =~ s/x/x / }
8647 elsif ( $type eq 'Q' ) {
8648 note_embedded_tab() if ( $token =~ "\t" );
8650 # make note of something like '$var = s/xxx/yyy/;'
8651 # in case it should have been '$var =~ s/xxx/yyy/;'
8653 $token =~ /^(s|tr|y|m|\/)/
8654 && $last_nonblank_token =~ /^(=|==|!=)$/
8656 # precededed by simple scalar
8657 && $last_last_nonblank_type eq 'i'
8658 && $last_last_nonblank_token =~ /^\$/
8660 # followed by some kind of termination
8661 # (but give complaint if we can's see far enough ahead)
8662 && $next_nonblank_token =~ /^[; \)\}]$/
8664 # scalar is not decleared
8666 $types_to_go[0] eq 'k'
8667 && $tokens_to_go[0] =~ /^(my|our|local)$/
8671 my $guess = substr( $last_nonblank_token, 0, 1 ) . '~';
8673 "Note: be sure you want '$last_nonblank_token' instead of '$guess' here\n"
8678 # trim blanks from right of qw quotes
8679 # (To avoid trimming qw quotes use -ntqw; the tokenizer handles this)
8680 elsif ( $type eq 'q' ) {
8682 note_embedded_tab() if ( $token =~ "\t" );
8685 #--------------------------------------------------------
8686 # End of section to patch token text
8687 #--------------------------------------------------------
8689 # insert any needed whitespace
8690 if ( ( $type ne 'b' )
8691 && ( $max_index_to_go >= 0 )
8692 && ( $types_to_go[$max_index_to_go] ne 'b' )
8693 && $rOpts_add_whitespace )
8695 my $ws = $$rwhite_space_flag[$j];
8698 insert_new_token_to_go( ' ', 'b', $slevel,
8699 $no_internal_newlines );
8703 # Do not allow breaks which would promote a side comment to a
8704 # block comment. In order to allow a break before an opening
8705 # or closing BLOCK, followed by a side comment, those sections
8706 # of code will handle this flag separately.
8707 my $side_comment_follows = ( $next_nonblank_token_type eq '#' );
8708 my $is_opening_BLOCK =
8712 && $block_type ne 't' );
8713 my $is_closing_BLOCK =
8717 && $block_type ne 't' );
8719 if ( $side_comment_follows
8720 && !$is_opening_BLOCK
8721 && !$is_closing_BLOCK )
8723 $no_internal_newlines = 1;
8726 # We're only going to handle breaking for code BLOCKS at this
8727 # (top) level. Other indentation breaks will be handled by
8728 # sub scan_list, which is better suited to dealing with them.
8729 if ($is_opening_BLOCK) {
8731 # Tentatively output this token. This is required before
8732 # calling starting_one_line_block. We may have to unstore
8733 # it, though, if we have to break before it.
8734 store_token_to_go($side_comment_follows);
8736 # Look ahead to see if we might form a one-line block
8738 starting_one_line_block( $j, $jmax, $level, $slevel,
8739 $ci_level, $rtokens, $rtoken_type, $rblock_type );
8740 clear_breakpoint_undo_stack();
8742 # to simplify the logic below, set a flag to indicate if
8743 # this opening brace is far from the keyword which introduces it
8744 my $keyword_on_same_line = 1;
8745 if ( ( $max_index_to_go >= 0 )
8746 && ( $last_nonblank_type eq ')' ) )
8748 if ( $block_type =~ /^(if|else|elsif)$/
8749 && ( $tokens_to_go[0] eq '}' )
8750 && $rOpts_cuddled_else )
8752 $keyword_on_same_line = 1;
8754 elsif ( ( $slevel < $nesting_depth_to_go[0] ) || $too_long )
8756 $keyword_on_same_line = 0;
8760 # decide if user requested break before '{'
8763 # use -bl flag if not a sub block of any type
8764 $block_type !~ /^sub/
8765 ? $rOpts->{'opening-brace-on-new-line'}
8767 # use -sbl flag unless this is an anonymous sub block
8768 : $block_type !~ /^sub\W*$/
8769 ? $rOpts->{'opening-sub-brace-on-new-line'}
8771 # do not break for anonymous subs
8774 # Break before an opening '{' ...
8780 # and we were unable to start looking for a block,
8781 && $index_start_one_line_block == UNDEFINED_INDEX
8783 # or if it will not be on same line as its keyword, so that
8784 # it will be outdented (eval.t, overload.t), and the user
8785 # has not insisted on keeping it on the right
8786 || ( !$keyword_on_same_line
8787 && !$rOpts->{'opening-brace-always-on-right'} )
8792 # but only if allowed
8793 unless ($no_internal_newlines) {
8795 # since we already stored this token, we must unstore it
8796 unstore_token_to_go();
8798 # then output the line
8799 output_line_to_go();
8801 # and now store this token at the start of a new line
8802 store_token_to_go($side_comment_follows);
8806 # Now update for side comment
8807 if ($side_comment_follows) { $no_internal_newlines = 1 }
8809 # now output this line
8810 unless ($no_internal_newlines) {
8811 output_line_to_go();
8815 elsif ($is_closing_BLOCK) {
8817 # If there is a pending one-line block ..
8818 if ( $index_start_one_line_block != UNDEFINED_INDEX ) {
8820 # we have to terminate it if..
8823 # it is too long (final length may be different from
8824 # initial estimate). note: must allow 1 space for this token
8825 excess_line_length( $index_start_one_line_block,
8826 $max_index_to_go ) >= 0
8828 # or if it has too many semicolons
8829 || ( $semicolons_before_block_self_destruct == 0
8830 && $last_nonblank_type ne ';' )
8833 destroy_one_line_block();
8837 # put a break before this closing curly brace if appropriate
8838 unless ( $no_internal_newlines
8839 || $index_start_one_line_block != UNDEFINED_INDEX )
8842 # add missing semicolon if ...
8843 # there are some tokens
8845 ( $max_index_to_go > 0 )
8847 # and we don't have one
8848 && ( $last_nonblank_type ne ';' )
8850 # patch until some block type issues are fixed:
8851 # Do not add semi-colon for block types '{',
8852 # '}', and ';' because we cannot be sure yet
8853 # that this is a block and not an anonomyous
8854 # hash (blktype.t, blktype1.t)
8855 && ( $block_type !~ /^[\{\};]$/ )
8857 # it seems best not to add semicolons in these
8858 # special block types: sort|map|grep
8859 && ( !$is_sort_map_grep{$block_type} )
8861 # and we are allowed to do so.
8862 && $rOpts->{'add-semicolons'}
8866 save_current_token();
8869 $level = $levels_to_go[$max_index_to_go];
8870 $slevel = $nesting_depth_to_go[$max_index_to_go];
8872 $nesting_blocks_to_go[$max_index_to_go];
8873 $ci_level = $ci_levels_to_go[$max_index_to_go];
8875 $container_type = "";
8876 $container_environment = "";
8877 $type_sequence = "";
8879 # Note - we remove any blank AFTER extracting its
8880 # parameters such as level, etc, above
8881 if ( $types_to_go[$max_index_to_go] eq 'b' ) {
8882 unstore_token_to_go();
8884 store_token_to_go();
8886 note_added_semicolon();
8887 restore_current_token();
8890 # then write out everything before this closing curly brace
8891 output_line_to_go();
8895 # Now update for side comment
8896 if ($side_comment_follows) { $no_internal_newlines = 1 }
8898 # store the closing curly brace
8899 store_token_to_go();
8901 # ok, we just stored a closing curly brace. Often, but
8902 # not always, we want to end the line immediately.
8903 # So now we have to check for special cases.
8905 # if this '}' successfully ends a one-line block..
8906 my $is_one_line_block = 0;
8908 if ( $index_start_one_line_block != UNDEFINED_INDEX ) {
8910 # Remember the type of token just before the
8911 # opening brace. It would be more general to use
8912 # a stack, but this will work for one-line blocks.
8913 $is_one_line_block =
8914 $types_to_go[$index_start_one_line_block];
8916 # we have to actually make it by removing tentative
8917 # breaks that were set within it
8918 undo_forced_breakpoint_stack(0);
8919 set_nobreaks( $index_start_one_line_block,
8920 $max_index_to_go - 1 );
8922 # then re-initialize for the next one-line block
8923 destroy_one_line_block();
8925 # then decide if we want to break after the '}' ..
8926 # We will keep going to allow certain brace followers as in:
8927 # do { $ifclosed = 1; last } unless $losing;
8929 # But make a line break if the curly ends a
8930 # significant block:
8932 $is_block_without_semicolon{$block_type}
8934 # if needless semicolon follows we handle it later
8935 && $next_nonblank_token ne ';'
8938 output_line_to_go() unless ($no_internal_newlines);
8942 # set string indicating what we need to look for brace follower
8944 if ( $block_type eq 'do' ) {
8945 $rbrace_follower = \%is_do_follower;
8947 elsif ( $block_type =~ /^(if|elsif|unless)$/ ) {
8948 $rbrace_follower = \%is_if_brace_follower;
8950 elsif ( $block_type eq 'else' ) {
8951 $rbrace_follower = \%is_else_brace_follower;
8954 # added eval for borris.t
8955 elsif ($is_sort_map_grep_eval{$block_type}
8956 || $is_one_line_block eq 'G' )
8958 $rbrace_follower = undef;
8963 elsif ( $block_type =~ /^sub\W*$/ ) {
8965 if ($is_one_line_block) {
8966 $rbrace_follower = \%is_anon_sub_1_brace_follower;
8969 $rbrace_follower = \%is_anon_sub_brace_follower;
8973 # None of the above: specify what can follow a closing
8974 # brace of a block which is not an
8975 # if/elsif/else/do/sort/map/grep/eval
8977 # 'Toolbar.pm', 'Menubar.pm', bless.t, '3rules.pl', 'break1.t
8979 $rbrace_follower = \%is_other_brace_follower;
8982 # See if an elsif block is followed by another elsif or else;
8984 if ( $block_type eq 'elsif' ) {
8986 if ( $next_nonblank_token_type eq 'b' ) { # end of line?
8987 $looking_for_else = 1; # ok, check on next line
8991 unless ( $next_nonblank_token =~ /^(elsif|else)$/ ) {
8992 write_logfile_entry("No else block :(\n");
8997 # keep going after certain block types (map,sort,grep,eval)
8998 # added eval for borris.t
9004 # if no more tokens, postpone decision until re-entring
9005 elsif ( ( $next_nonblank_token_type eq 'b' )
9006 && $rOpts_add_newlines )
9008 unless ($rbrace_follower) {
9009 output_line_to_go() unless ($no_internal_newlines);
9013 elsif ($rbrace_follower) {
9015 unless ( $rbrace_follower->{$next_nonblank_token} ) {
9016 output_line_to_go() unless ($no_internal_newlines);
9018 $rbrace_follower = undef;
9022 output_line_to_go() unless ($no_internal_newlines);
9025 } # end treatment of closing block token
9028 elsif ( $type eq ';' ) {
9030 # kill one-line blocks with too many semicolons
9031 $semicolons_before_block_self_destruct--;
9033 ( $semicolons_before_block_self_destruct < 0 )
9034 || ( $semicolons_before_block_self_destruct == 0
9035 && $next_nonblank_token_type !~ /^[b\}]$/ )
9038 destroy_one_line_block();
9041 # Remove unnecessary semicolons, but not after bare
9042 # blocks, where it could be unsafe if the brace is
9046 $last_nonblank_token eq '}'
9048 $is_block_without_semicolon{
9049 $last_nonblank_block_type}
9050 || $last_nonblank_block_type =~ /^sub\s+\w/
9051 || $last_nonblank_block_type =~ /^\w+:$/ )
9053 || $last_nonblank_type eq ';'
9058 $rOpts->{'delete-semicolons'}
9060 # don't delete ; before a # because it would promote it
9061 # to a block comment
9062 && ( $next_nonblank_token_type ne '#' )
9065 note_deleted_semicolon();
9067 unless ( $no_internal_newlines
9068 || $index_start_one_line_block != UNDEFINED_INDEX );
9072 write_logfile_entry("Extra ';'\n");
9075 store_token_to_go();
9078 unless ( $no_internal_newlines
9079 || ( $next_nonblank_token eq '}' ) );
9083 # handle here_doc target string
9084 elsif ( $type eq 'h' ) {
9085 $no_internal_newlines =
9086 1; # no newlines after seeing here-target
9087 destroy_one_line_block();
9088 store_token_to_go();
9091 # handle all other token types
9094 # if this is a blank...
9095 if ( $type eq 'b' ) {
9097 # make it just one character
9098 $token = ' ' if $rOpts_add_whitespace;
9100 # delete it if unwanted by whitespace rules
9101 # or we are deleting all whitespace
9102 my $ws = $$rwhite_space_flag[ $j + 1 ];
9103 if ( ( defined($ws) && $ws == -1 )
9104 || $rOpts_delete_old_whitespace )
9107 # unless it might make a syntax error
9109 unless is_essential_whitespace(
9110 $last_last_nonblank_token,
9111 $last_last_nonblank_type,
9112 $tokens_to_go[$max_index_to_go],
9113 $types_to_go[$max_index_to_go],
9114 $$rtokens[ $j + 1 ],
9115 $$rtoken_type[ $j + 1 ]
9119 store_token_to_go();
9122 # remember two previous nonblank OUTPUT tokens
9123 if ( $type ne '#' && $type ne 'b' ) {
9124 $last_last_nonblank_token = $last_nonblank_token;
9125 $last_last_nonblank_type = $last_nonblank_type;
9126 $last_nonblank_token = $token;
9127 $last_nonblank_type = $type;
9128 $last_nonblank_block_type = $block_type;
9131 # unset the continued-quote flag since it only applies to the
9132 # first token, and we want to resume normal formatting if
9133 # there are additional tokens on the line
9134 $in_continued_quote = 0;
9136 } # end of loop over all tokens in this 'line_of_tokens'
9138 # we have to flush ..
9141 # if there is a side comment
9142 ( ( $type eq '#' ) && !$rOpts->{'delete-side-comments'} )
9144 # if this line ends in a quote
9145 # NOTE: This is critically important for insuring that quoted lines
9146 # do not get processed by things like -sot and -sct
9149 # if this is a VERSION statement
9150 || $is_VERSION_statement
9152 # to keep a label on one line if that is how it is now
9153 || ( ( $type eq 'J' ) && ( $max_index_to_go == 0 ) )
9155 # if we are instructed to keep all old line breaks
9156 || !$rOpts->{'delete-old-newlines'}
9159 destroy_one_line_block();
9160 output_line_to_go();
9163 # mark old line breakpoints in current output stream
9164 if ( $max_index_to_go >= 0 && !$rOpts_ignore_old_breakpoints ) {
9165 $old_breakpoint_to_go[$max_index_to_go] = 1;
9167 } # end sub print_line_of_tokens
9168 } # end print_line_of_tokens
9170 # sub output_line_to_go sends one logical line of tokens on down the
9171 # pipeline to the VerticalAligner package, breaking the line into continuation
9172 # lines as necessary. The line of tokens is ready to go in the "to_go"
9174 sub output_line_to_go {
9176 # debug stuff; this routine can be called from many points
9177 FORMATTER_DEBUG_FLAG_OUTPUT && do {
9178 my ( $a, $b, $c ) = caller;
9180 "OUTPUT: output_line_to_go called: $a $c $last_nonblank_type $last_nonblank_token, one_line=$index_start_one_line_block, tokens to write=$max_index_to_go\n"
9182 my $output_str = join "", @tokens_to_go[ 0 .. $max_index_to_go ];
9183 write_diagnostics("$output_str\n");
9186 # just set a tentative breakpoint if we might be in a one-line block
9187 if ( $index_start_one_line_block != UNDEFINED_INDEX ) {
9188 set_forced_breakpoint($max_index_to_go);
9192 my $cscw_block_comment;
9193 $cscw_block_comment = add_closing_side_comment()
9194 if ( $rOpts->{'closing-side-comments'} && $max_index_to_go >= 0 );
9196 match_opening_and_closing_tokens();
9198 # tell the -lp option we are outputting a batch so it can close
9199 # any unfinished items in its stack
9202 # If this line ends in a code block brace, set breaks at any
9203 # previous closing code block braces to breakup a chain of code
9204 # blocks on one line. This is very rare but can happen for
9205 # user-defined subs. For example we might be looking at this:
9206 # BOOL { $server_data{uptime} > 0; } NUM { $server_data{load}; } STR {
9207 my $saw_good_break = 0; # flag to force breaks even if short line
9210 # looking for opening or closing block brace
9211 $block_type_to_go[$max_index_to_go]
9213 # but not one of these which are never duplicated on a line:
9214 # until|while|for|if|elsif|else
9215 && !$is_block_without_semicolon{ $block_type_to_go[$max_index_to_go] }
9218 my $lev = $nesting_depth_to_go[$max_index_to_go];
9220 # Walk backwards from the end and
9221 # set break at any closing block braces at the same level.
9222 # But quit if we are not in a chain of blocks.
9223 for ( my $i = $max_index_to_go - 1 ; $i >= 0 ; $i-- ) {
9224 last if ( $levels_to_go[$i] < $lev ); # stop at a lower level
9225 next if ( $levels_to_go[$i] > $lev ); # skip past higher level
9227 if ( $block_type_to_go[$i] ) {
9228 if ( $tokens_to_go[$i] eq '}' ) {
9229 set_forced_breakpoint($i);
9230 $saw_good_break = 1;
9234 # quit if we see anything besides words, function, blanks
9236 elsif ( $types_to_go[$i] !~ /^[\(\)Gwib]$/ ) { last }
9241 my $imax = $max_index_to_go;
9243 # trim any blank tokens
9244 if ( $max_index_to_go >= 0 ) {
9245 if ( $types_to_go[$imin] eq 'b' ) { $imin++ }
9246 if ( $types_to_go[$imax] eq 'b' ) { $imax-- }
9249 # anything left to write?
9250 if ( $imin <= $imax ) {
9252 # add a blank line before certain key types
9253 if ( $last_line_leading_type !~ /^[#b]/ ) {
9255 my $leading_token = $tokens_to_go[$imin];
9256 my $leading_type = $types_to_go[$imin];
9258 # blank lines before subs except declarations and one-liners
9259 # MCONVERSION LOCATION - for sub tokenization change
9260 if ( $leading_token =~ /^(sub\s)/ && $leading_type eq 'i' ) {
9261 $want_blank = ( $rOpts->{'blanks-before-subs'} )
9263 terminal_type( \@types_to_go, \@block_type_to_go, $imin,
9264 $imax ) !~ /^[\;\}]$/
9268 # break before all package declarations
9269 # MCONVERSION LOCATION - for tokenizaton change
9270 elsif ($leading_token =~ /^(package\s)/
9271 && $leading_type eq 'i' )
9273 $want_blank = ( $rOpts->{'blanks-before-subs'} );
9276 # break before certain key blocks except one-liners
9277 if ( $leading_token =~ /^(BEGIN|END)$/ && $leading_type eq 'k' ) {
9278 $want_blank = ( $rOpts->{'blanks-before-subs'} )
9280 terminal_type( \@types_to_go, \@block_type_to_go, $imin,
9285 # Break before certain block types if we haven't had a
9286 # break at this level for a while. This is the
9287 # difficult decision..
9288 elsif ($leading_token =~ /^(unless|if|while|until|for|foreach)$/
9289 && $leading_type eq 'k' )
9291 my $lc = $nonblank_lines_at_depth[$last_line_leading_level];
9292 if ( !defined($lc) ) { $lc = 0 }
9294 $want_blank = $rOpts->{'blanks-before-blocks'}
9295 && $lc >= $rOpts->{'long-block-line-count'}
9296 && $file_writer_object->get_consecutive_nonblank_lines() >=
9297 $rOpts->{'long-block-line-count'}
9299 terminal_type( \@types_to_go, \@block_type_to_go, $imin,
9306 # future: send blank line down normal path to VerticalAligner
9307 Perl::Tidy::VerticalAligner::flush();
9308 $file_writer_object->write_blank_code_line();
9312 # update blank line variables and count number of consecutive
9313 # non-blank, non-comment lines at this level
9314 $last_last_line_leading_level = $last_line_leading_level;
9315 $last_line_leading_level = $levels_to_go[$imin];
9316 if ( $last_line_leading_level < 0 ) { $last_line_leading_level = 0 }
9317 $last_line_leading_type = $types_to_go[$imin];
9318 if ( $last_line_leading_level == $last_last_line_leading_level
9319 && $last_line_leading_type ne 'b'
9320 && $last_line_leading_type ne '#'
9321 && defined( $nonblank_lines_at_depth[$last_line_leading_level] ) )
9323 $nonblank_lines_at_depth[$last_line_leading_level]++;
9326 $nonblank_lines_at_depth[$last_line_leading_level] = 1;
9329 FORMATTER_DEBUG_FLAG_FLUSH && do {
9330 my ( $package, $file, $line ) = caller;
9332 "FLUSH: flushing from $package $file $line, types= $types_to_go[$imin] to $types_to_go[$imax]\n";
9335 # add a couple of extra terminal blank tokens
9338 # set all forced breakpoints for good list formatting
9339 my $is_long_line = excess_line_length( $imin, $max_index_to_go ) > 0;
9342 $max_index_to_go > 0
9345 || $old_line_count_in_batch > 1
9346 || is_unbalanced_batch()
9348 $comma_count_in_batch
9349 && ( $rOpts_maximum_fields_per_table > 0
9350 || $rOpts_comma_arrow_breakpoints == 0 )
9355 $saw_good_break ||= scan_list();
9358 # let $ri_first and $ri_last be references to lists of
9359 # first and last tokens of line fragments to output..
9360 my ( $ri_first, $ri_last );
9362 # write a single line if..
9365 # we aren't allowed to add any newlines
9366 !$rOpts_add_newlines
9368 # or, we don't already have an interior breakpoint
9369 # and we didn't see a good breakpoint
9371 !$forced_breakpoint_count
9374 # and this line is 'short'
9379 @$ri_first = ($imin);
9380 @$ri_last = ($imax);
9383 # otherwise use multiple lines
9386 ( $ri_first, $ri_last ) = set_continuation_breaks($saw_good_break);
9388 break_all_chain_tokens( $ri_first, $ri_last );
9390 # now we do a correction step to clean this up a bit
9391 # (The only time we would not do this is for debugging)
9392 if ( $rOpts->{'recombine'} ) {
9393 ( $ri_first, $ri_last ) =
9394 recombine_breakpoints( $ri_first, $ri_last );
9398 # do corrector step if -lp option is used
9400 if ($rOpts_line_up_parentheses) {
9401 $do_not_pad = correct_lp_indentation( $ri_first, $ri_last );
9403 send_lines_to_vertical_aligner( $ri_first, $ri_last, $do_not_pad );
9405 prepare_for_new_input_lines();
9407 # output any new -cscw block comment
9408 if ($cscw_block_comment) {
9410 $file_writer_object->write_code_line( $cscw_block_comment . "\n" );
9414 sub note_added_semicolon {
9415 $last_added_semicolon_at = $input_line_number;
9416 if ( $added_semicolon_count == 0 ) {
9417 $first_added_semicolon_at = $last_added_semicolon_at;
9419 $added_semicolon_count++;
9420 write_logfile_entry("Added ';' here\n");
9423 sub note_deleted_semicolon {
9424 $last_deleted_semicolon_at = $input_line_number;
9425 if ( $deleted_semicolon_count == 0 ) {
9426 $first_deleted_semicolon_at = $last_deleted_semicolon_at;
9428 $deleted_semicolon_count++;
9429 write_logfile_entry("Deleted unnecessary ';'\n"); # i hope ;)
9432 sub note_embedded_tab {
9433 $embedded_tab_count++;
9434 $last_embedded_tab_at = $input_line_number;
9435 if ( !$first_embedded_tab_at ) {
9436 $first_embedded_tab_at = $last_embedded_tab_at;
9439 if ( $embedded_tab_count <= MAX_NAG_MESSAGES ) {
9440 write_logfile_entry("Embedded tabs in quote or pattern\n");
9444 sub starting_one_line_block {
9446 # after seeing an opening curly brace, look for the closing brace
9447 # and see if the entire block will fit on a line. This routine is
9448 # not always right because it uses the old whitespace, so a check
9449 # is made later (at the closing brace) to make sure we really
9450 # have a one-line block. We have to do this preliminary check,
9451 # though, because otherwise we would always break at a semicolon
9452 # within a one-line block if the block contains multiple statements.
9454 my ( $j, $jmax, $level, $slevel, $ci_level, $rtokens, $rtoken_type,
9458 # kill any current block - we can only go 1 deep
9459 destroy_one_line_block();
9462 # 1=distance from start of block to opening brace exceeds line length
9467 # shouldn't happen: there must have been a prior call to
9468 # store_token_to_go to put the opening brace in the output stream
9469 if ( $max_index_to_go < 0 ) {
9470 warning("program bug: store_token_to_go called incorrectly\n");
9471 report_definite_bug();
9475 # cannot use one-line blocks with cuddled else else/elsif lines
9476 if ( ( $tokens_to_go[0] eq '}' ) && $rOpts_cuddled_else ) {
9481 my $block_type = $$rblock_type[$j];
9483 # find the starting keyword for this block (such as 'if', 'else', ...)
9485 if ( $block_type =~ /^[\{\}\;\:]$/ ) {
9486 $i_start = $max_index_to_go;
9489 elsif ( $last_last_nonblank_token_to_go eq ')' ) {
9491 # For something like "if (xxx) {", the keyword "if" will be
9492 # just after the most recent break. This will be 0 unless
9493 # we have just killed a one-line block and are starting another.
9495 $i_start = $index_max_forced_break + 1;
9496 if ( $types_to_go[$i_start] eq 'b' ) {
9500 unless ( $tokens_to_go[$i_start] eq $block_type ) {
9505 # the previous nonblank token should start these block types
9507 ( $last_last_nonblank_token_to_go eq $block_type )
9508 || ( $block_type =~ /^sub/
9509 && $last_last_nonblank_token_to_go =~ /^sub/ )
9512 $i_start = $last_last_nonblank_index_to_go;
9515 # patch for SWITCH/CASE to retain one-line case/when blocks
9516 elsif ( $block_type eq 'case' || $block_type eq 'when' ) {
9517 $i_start = $index_max_forced_break + 1;
9518 if ( $types_to_go[$i_start] eq 'b' ) {
9521 unless ( $tokens_to_go[$i_start] eq $block_type ) {
9530 my $pos = total_line_length( $i_start, $max_index_to_go ) - 1;
9534 # see if length is too long to even start
9535 if ( $pos > $rOpts_maximum_line_length ) {
9539 for ( $i = $j + 1 ; $i <= $jmax ; $i++ ) {
9541 # old whitespace could be arbitrarily large, so don't use it
9542 if ( $$rtoken_type[$i] eq 'b' ) { $pos += 1 }
9543 else { $pos += length( $$rtokens[$i] ) }
9545 # Return false result if we exceed the maximum line length,
9546 if ( $pos > $rOpts_maximum_line_length ) {
9550 # or encounter another opening brace before finding the closing brace.
9551 elsif ($$rtokens[$i] eq '{'
9552 && $$rtoken_type[$i] eq '{'
9553 && $$rblock_type[$i] )
9558 # if we find our closing brace..
9559 elsif ($$rtokens[$i] eq '}'
9560 && $$rtoken_type[$i] eq '}'
9561 && $$rblock_type[$i] )
9564 # be sure any trailing comment also fits on the line
9566 ( $$rtoken_type[ $i + 1 ] eq 'b' ) ? $i + 2 : $i + 1;
9568 if ( $$rtoken_type[$i_nonblank] eq '#' ) {
9569 $pos += length( $$rtokens[$i_nonblank] );
9571 if ( $i_nonblank > $i + 1 ) {
9572 $pos += length( $$rtokens[ $i + 1 ] );
9575 if ( $pos > $rOpts_maximum_line_length ) {
9580 # ok, it's a one-line block
9581 create_one_line_block( $i_start, 20 );
9585 # just keep going for other characters
9590 # Allow certain types of new one-line blocks to form by joining
9591 # input lines. These can be safely done, but for other block types,
9592 # we keep old one-line blocks but do not form new ones. It is not
9593 # always a good idea to make as many one-line blocks as possible,
9594 # so other types are not done. The user can always use -mangle.
9595 if ( $is_sort_map_grep_eval{$block_type} ) {
9596 create_one_line_block( $i_start, 1 );
9602 sub unstore_token_to_go {
9604 # remove most recent token from output stream
9605 if ( $max_index_to_go > 0 ) {
9609 $max_index_to_go = UNDEFINED_INDEX;
9614 sub want_blank_line {
9616 $file_writer_object->want_blank_line();
9619 sub write_unindented_line {
9621 $file_writer_object->write_line( $_[0] );
9626 # If there is a single, long parameter within parens, like this:
9628 # $self->command( "/msg "
9630 # . " You said $1, but did you know that it's square was "
9631 # . $1 * $1 . " ?" );
9633 # we can remove the continuation indentation of the 2nd and higher lines
9634 # to achieve this effect, which is more pleasing:
9636 # $self->command("/msg "
9638 # . " You said $1, but did you know that it's square was "
9639 # . $1 * $1 . " ?");
9641 my ( $line_open, $i_start, $closing_index, $ri_first, $ri_last ) = @_;
9642 my $max_line = @$ri_first - 1;
9644 # must be multiple lines
9645 return unless $max_line > $line_open;
9647 my $lev_start = $levels_to_go[$i_start];
9648 my $ci_start_plus = 1 + $ci_levels_to_go[$i_start];
9650 # see if all additional lines in this container have continuation
9653 my $line_1 = 1 + $line_open;
9654 for ( $n = $line_1 ; $n <= $max_line ; ++$n ) {
9655 my $ibeg = $$ri_first[$n];
9656 my $iend = $$ri_last[$n];
9657 if ( $ibeg eq $closing_index ) { $n--; last }
9658 return if ( $lev_start != $levels_to_go[$ibeg] );
9659 return if ( $ci_start_plus != $ci_levels_to_go[$ibeg] );
9660 last if ( $closing_index <= $iend );
9663 # we can reduce the indentation of all continuation lines
9664 my $continuation_line_count = $n - $line_open;
9665 @ci_levels_to_go[ @$ri_first[ $line_1 .. $n ] ] =
9666 (0) x ($continuation_line_count);
9667 @leading_spaces_to_go[ @$ri_first[ $line_1 .. $n ] ] =
9668 @reduced_spaces_to_go[ @$ri_first[ $line_1 .. $n ] ];
9671 sub set_logical_padding {
9673 # Look at a batch of lines and see if extra padding can improve the
9674 # alignment when there are certain leading operators. Here is an
9675 # example, in which some extra space is introduced before
9676 # '( $year' to make it line up with the subsequent lines:
9678 # if ( ( $Year < 1601 )
9679 # || ( $Year > 2899 )
9680 # || ( $EndYear < 1601 )
9681 # || ( $EndYear > 2899 ) )
9683 # &Error_OutOfRange;
9686 my ( $ri_first, $ri_last ) = @_;
9687 my $max_line = @$ri_first - 1;
9689 my ( $ibeg, $ibeg_next, $ibegm, $iend, $iendm, $ipad, $line, $pad_spaces,
9690 $tok_next, $has_leading_op_next, $has_leading_op );
9692 # looking at each line of this batch..
9693 foreach $line ( 0 .. $max_line - 1 ) {
9695 # see if the next line begins with a logical operator
9696 $ibeg = $$ri_first[$line];
9697 $iend = $$ri_last[$line];
9698 $ibeg_next = $$ri_first[ $line + 1 ];
9699 $tok_next = $tokens_to_go[$ibeg_next];
9700 $has_leading_op_next = $is_chain_operator{$tok_next};
9701 next unless ($has_leading_op_next);
9703 # next line must not be at lesser depth
9705 if ( $nesting_depth_to_go[$ibeg] > $nesting_depth_to_go[$ibeg_next] );
9707 # identify the token in this line to be padded on the left
9710 # handle lines at same depth...
9711 if ( $nesting_depth_to_go[$ibeg] == $nesting_depth_to_go[$ibeg_next] ) {
9713 # if this is not first line of the batch ...
9716 # and we have leading operator
9717 next if $has_leading_op;
9720 # 1. the previous line is at lesser depth, or
9721 # 2. the previous line ends in an assignment
9723 # Example 1: previous line at lesser depth
9724 # if ( ( $Year < 1601 ) # <- we are here but
9725 # || ( $Year > 2899 ) # list has not yet
9726 # || ( $EndYear < 1601 ) # collapsed vertically
9727 # || ( $EndYear > 2899 ) )
9730 # Example 2: previous line ending in assignment:
9732 # $year % 4 ? 0 # <- We are here
9738 $is_assignment{ $types_to_go[$iendm] }
9739 || ( $nesting_depth_to_go[$ibegm] <
9740 $nesting_depth_to_go[$ibeg] )
9743 # we will add padding before the first token
9747 # for first line of the batch..
9750 # WARNING: Never indent if first line is starting in a
9751 # continued quote, which would change the quote.
9752 next if $starting_in_quote;
9754 # if this is text after closing '}'
9755 # then look for an interior token to pad
9756 if ( $types_to_go[$ibeg] eq '}' ) {
9760 # otherwise, we might pad if it looks really good
9763 # we might pad token $ibeg, so be sure that it
9764 # is at the same depth as the next line.
9766 if ( $nesting_depth_to_go[$ibeg] !=
9767 $nesting_depth_to_go[$ibeg_next] );
9769 # We can pad on line 1 of a statement if at least 3
9770 # lines will be aligned. Otherwise, it
9771 # can look very confusing.
9773 # We have to be careful not to pad if there are too few
9774 # lines. The current rule is:
9775 # (1) in general we require at least 3 consecutive lines
9776 # with the same leading chain operator token,
9777 # (2) but an exception is that we only require two lines
9778 # with leading colons if there are no more lines. For example,
9779 # the first $i in the following snippet would get padding
9780 # by the second rule:
9782 # $i == 1 ? ( "First", "Color" )
9783 # : $i == 2 ? ( "Then", "Rarity" )
9784 # : ( "Then", "Name" );
9786 if ( $max_line > 1 ) {
9787 my $leading_token = $tokens_to_go[$ibeg_next];
9790 # never indent line 1 of a '.' series because
9791 # previous line is most likely at same level.
9792 # TODO: we should also look at the leasing_spaces
9793 # of the last output line and skip if it is same
9795 next if ( $leading_token eq '.' );
9798 foreach my $l ( 2 .. 3 ) {
9799 last if ( $line + $l > $max_line );
9800 my $ibeg_next_next = $$ri_first[ $line + $l ];
9801 if ( $tokens_to_go[$ibeg_next_next] ne
9809 next if ($tokens_differ);
9810 next if ( $count < 3 && $leading_token ne ':' );
9820 # find interior token to pad if necessary
9821 if ( !defined($ipad) ) {
9823 for ( my $i = $ibeg ; ( $i < $iend ) && !$ipad ; $i++ ) {
9825 # find any unclosed container
9827 unless ( $type_sequence_to_go[$i]
9828 && $mate_index_to_go[$i] > $iend );
9830 # find next nonblank token to pad
9832 if ( $types_to_go[$ipad] eq 'b' ) {
9834 last if ( $ipad > $iend );
9840 # next line must not be at greater depth
9841 my $iend_next = $$ri_last[ $line + 1 ];
9843 if ( $nesting_depth_to_go[ $iend_next + 1 ] >
9844 $nesting_depth_to_go[$ipad] );
9846 # lines must be somewhat similar to be padded..
9847 my $inext_next = $ibeg_next + 1;
9848 if ( $types_to_go[$inext_next] eq 'b' ) {
9851 my $type = $types_to_go[$ipad];
9853 # see if there are multiple continuation lines
9854 my $logical_continuation_lines = 1;
9855 if ( $line + 2 <= $max_line ) {
9856 my $leading_token = $tokens_to_go[$ibeg_next];
9857 my $ibeg_next_next = $$ri_first[ $line + 2 ];
9858 if ( $tokens_to_go[$ibeg_next_next] eq $leading_token
9859 && $nesting_depth_to_go[$ibeg_next] eq
9860 $nesting_depth_to_go[$ibeg_next_next] )
9862 $logical_continuation_lines++;
9867 # either we have multiple continuation lines to follow
9868 # and we are not padding the first token
9869 ( $logical_continuation_lines > 1 && $ipad > 0 )
9875 $types_to_go[$inext_next] eq $type
9877 # and keywords must match if keyword
9880 && $tokens_to_go[$ipad] ne $tokens_to_go[$inext_next]
9886 #----------------------begin special checks--------------
9889 # A check is needed before we can make the pad.
9890 # If we are in a list with some long items, we want each
9891 # item to stand out. So in the following example, the
9892 # first line begining with '$casefold->' would look good
9893 # padded to align with the next line, but then it
9894 # would be indented more than the last line, so we
9898 # $casefold->{code} eq '0041'
9899 # && $casefold->{status} eq 'C'
9900 # && $casefold->{mapping} eq '0061',
9905 # It would be faster, and almost as good, to use a comma
9906 # count, and not pad if comma_count > 1 and the previous
9907 # line did not end with a comma.
9911 my $ibg = $$ri_first[ $line + 1 ];
9912 my $depth = $nesting_depth_to_go[ $ibg + 1 ];
9914 # just use simplified formula for leading spaces to avoid
9915 # needless sub calls
9916 my $lsp = $levels_to_go[$ibg] + $ci_levels_to_go[$ibg];
9918 # look at each line beyond the next ..
9920 foreach $l ( $line + 2 .. $max_line ) {
9921 my $ibg = $$ri_first[$l];
9923 # quit looking at the end of this container
9925 if ( $nesting_depth_to_go[ $ibg + 1 ] < $depth )
9926 || ( $nesting_depth_to_go[$ibg] < $depth );
9928 # cannot do the pad if a later line would be
9930 if ( $levels_to_go[$ibg] + $ci_levels_to_go[$ibg] < $lsp ) {
9936 # don't pad if we end in a broken list
9937 if ( $l == $max_line ) {
9938 my $i2 = $$ri_last[$l];
9939 if ( $types_to_go[$i2] eq '#' ) {
9940 my $i1 = $$ri_first[$l];
9943 terminal_type( \@types_to_go, \@block_type_to_go, $i1,
9950 # a minus may introduce a quoted variable, and we will
9951 # add the pad only if this line begins with a bare word,
9952 # such as for the word 'Button' here:
9954 # Button => "Print letter \"~$_\"",
9955 # -command => [ sub { print "$_[0]\n" }, $_ ],
9956 # -accelerator => "Meta+$_"
9959 # On the other hand, if 'Button' is quoted, it looks best
9962 # 'Button' => "Print letter \"~$_\"",
9963 # -command => [ sub { print "$_[0]\n" }, $_ ],
9964 # -accelerator => "Meta+$_"
9966 if ( $types_to_go[$ibeg_next] eq 'm' ) {
9967 $ok_to_pad = 0 if $types_to_go[$ibeg] eq 'Q';
9970 next unless $ok_to_pad;
9972 #----------------------end special check---------------
9974 my $length_1 = total_line_length( $ibeg, $ipad - 1 );
9975 my $length_2 = total_line_length( $ibeg_next, $inext_next - 1 );
9976 $pad_spaces = $length_2 - $length_1;
9978 # make sure this won't change if -lp is used
9979 my $indentation_1 = $leading_spaces_to_go[$ibeg];
9980 if ( ref($indentation_1) ) {
9981 if ( $indentation_1->get_RECOVERABLE_SPACES() == 0 ) {
9982 my $indentation_2 = $leading_spaces_to_go[$ibeg_next];
9983 unless ( $indentation_2->get_RECOVERABLE_SPACES() == 0 ) {
9989 # we might be able to handle a pad of -1 by removing a blank
9991 if ( $pad_spaces < 0 ) {
9992 if ( $pad_spaces == -1 ) {
9993 if ( $ipad > $ibeg && $types_to_go[ $ipad - 1 ] eq 'b' ) {
9994 $tokens_to_go[ $ipad - 1 ] = '';
10000 # now apply any padding for alignment
10001 if ( $ipad >= 0 && $pad_spaces ) {
10002 my $length_t = total_line_length( $ibeg, $iend );
10003 if ( $pad_spaces + $length_t <= $rOpts_maximum_line_length ) {
10004 $tokens_to_go[$ipad] =
10005 ' ' x $pad_spaces . $tokens_to_go[$ipad];
10013 $has_leading_op = $has_leading_op_next;
10014 } # end of loop over lines
10018 sub correct_lp_indentation {
10020 # When the -lp option is used, we need to make a last pass through
10021 # each line to correct the indentation positions in case they differ
10022 # from the predictions. This is necessary because perltidy uses a
10023 # predictor/corrector method for aligning with opening parens. The
10024 # predictor is usually good, but sometimes stumbles. The corrector
10025 # tries to patch things up once the actual opening paren locations
10027 my ( $ri_first, $ri_last ) = @_;
10028 my $do_not_pad = 0;
10030 # Note on flag '$do_not_pad':
10031 # We want to avoid a situation like this, where the aligner inserts
10032 # whitespace before the '=' to align it with a previous '=', because
10033 # otherwise the parens might become mis-aligned in a situation like
10034 # this, where the '=' has become aligned with the previous line,
10035 # pushing the opening '(' forward beyond where we want it.
10037 # $mkFloor::currentRoom = '';
10038 # $mkFloor::c_entry = $c->Entry(
10040 # -relief => 'sunken',
10044 # We leave it to the aligner to decide how to do this.
10046 # first remove continuation indentation if appropriate
10047 my $max_line = @$ri_first - 1;
10049 # looking at each line of this batch..
10050 my ( $ibeg, $iend );
10052 foreach $line ( 0 .. $max_line ) {
10053 $ibeg = $$ri_first[$line];
10054 $iend = $$ri_last[$line];
10056 # looking at each token in this output line..
10058 foreach $i ( $ibeg .. $iend ) {
10060 # How many space characters to place before this token
10061 # for special alignment. Actual padding is done in the
10064 # looking for next unvisited indentation item
10065 my $indentation = $leading_spaces_to_go[$i];
10066 if ( !$indentation->get_MARKED() ) {
10067 $indentation->set_MARKED(1);
10069 # looking for indentation item for which we are aligning
10070 # with parens, braces, and brackets
10071 next unless ( $indentation->get_ALIGN_PAREN() );
10073 # skip closed container on this line
10074 if ( $i > $ibeg ) {
10076 if ( $types_to_go[$im] eq 'b' && $im > $ibeg ) { $im-- }
10077 if ( $type_sequence_to_go[$im]
10078 && $mate_index_to_go[$im] <= $iend )
10084 if ( $line == 1 && $i == $ibeg ) {
10088 # Ok, let's see what the error is and try to fix it
10090 my $predicted_pos = $indentation->get_SPACES();
10091 if ( $i > $ibeg ) {
10093 # token is mid-line - use length to previous token
10094 $actual_pos = total_line_length( $ibeg, $i - 1 );
10096 # for mid-line token, we must check to see if all
10097 # additional lines have continuation indentation,
10098 # and remove it if so. Otherwise, we do not get
10100 my $closing_index = $indentation->get_CLOSED();
10101 if ( $closing_index > $iend ) {
10102 my $ibeg_next = $$ri_first[ $line + 1 ];
10103 if ( $ci_levels_to_go[$ibeg_next] > 0 ) {
10104 undo_lp_ci( $line, $i, $closing_index, $ri_first,
10109 elsif ( $line > 0 ) {
10111 # handle case where token starts a new line;
10112 # use length of previous line
10113 my $ibegm = $$ri_first[ $line - 1 ];
10114 my $iendm = $$ri_last[ $line - 1 ];
10115 $actual_pos = total_line_length( $ibegm, $iendm );
10119 if ( $types_to_go[ $iendm + 1 ] eq 'b' );
10123 # token is first character of first line of batch
10124 $actual_pos = $predicted_pos;
10127 my $move_right = $actual_pos - $predicted_pos;
10129 # done if no error to correct (gnu2.t)
10130 if ( $move_right == 0 ) {
10131 $indentation->set_RECOVERABLE_SPACES($move_right);
10135 # if we have not seen closure for this indentation in
10136 # this batch, we can only pass on a request to the
10138 my $closing_index = $indentation->get_CLOSED();
10140 if ( $closing_index < 0 ) {
10141 $indentation->set_RECOVERABLE_SPACES($move_right);
10145 # If necessary, look ahead to see if there is really any
10146 # leading whitespace dependent on this whitespace, and
10147 # also find the longest line using this whitespace.
10148 # Since it is always safe to move left if there are no
10149 # dependents, we only need to do this if we may have
10150 # dependent nodes or need to move right.
10152 my $right_margin = 0;
10153 my $have_child = $indentation->get_HAVE_CHILD();
10155 my %saw_indentation;
10156 my $line_count = 1;
10157 $saw_indentation{$indentation} = $indentation;
10159 if ( $have_child || $move_right > 0 ) {
10161 my $max_length = 0;
10162 if ( $i == $ibeg ) {
10163 $max_length = total_line_length( $ibeg, $iend );
10166 # look ahead at the rest of the lines of this batch..
10168 foreach $line_t ( $line + 1 .. $max_line ) {
10169 my $ibeg_t = $$ri_first[$line_t];
10170 my $iend_t = $$ri_last[$line_t];
10171 last if ( $closing_index <= $ibeg_t );
10173 # remember all different indentation objects
10174 my $indentation_t = $leading_spaces_to_go[$ibeg_t];
10175 $saw_indentation{$indentation_t} = $indentation_t;
10178 # remember longest line in the group
10179 my $length_t = total_line_length( $ibeg_t, $iend_t );
10180 if ( $length_t > $max_length ) {
10181 $max_length = $length_t;
10184 $right_margin = $rOpts_maximum_line_length - $max_length;
10185 if ( $right_margin < 0 ) { $right_margin = 0 }
10188 my $first_line_comma_count =
10189 grep { $_ eq ',' } @types_to_go[ $ibeg .. $iend ];
10190 my $comma_count = $indentation->get_COMMA_COUNT();
10191 my $arrow_count = $indentation->get_ARROW_COUNT();
10193 # This is a simple approximate test for vertical alignment:
10194 # if we broke just after an opening paren, brace, bracket,
10195 # and there are 2 or more commas in the first line,
10196 # and there are no '=>'s,
10197 # then we are probably vertically aligned. We could set
10198 # an exact flag in sub scan_list, but this is good
10200 my $indentation_count = keys %saw_indentation;
10201 my $is_vertically_aligned =
10203 && $first_line_comma_count > 1
10204 && $indentation_count == 1
10205 && ( $arrow_count == 0 || $arrow_count == $line_count ) );
10207 # Make the move if possible ..
10210 # we can always move left
10213 # but we should only move right if we are sure it will
10214 # not spoil vertical alignment
10215 || ( $comma_count == 0 )
10216 || ( $comma_count > 0 && !$is_vertically_aligned )
10220 ( $move_right <= $right_margin )
10224 foreach ( keys %saw_indentation ) {
10225 $saw_indentation{$_}
10226 ->permanently_decrease_AVAILABLE_SPACES( -$move );
10230 # Otherwise, record what we want and the vertical aligner
10231 # will try to recover it.
10233 $indentation->set_RECOVERABLE_SPACES($move_right);
10238 return $do_not_pad;
10241 # flush is called to output any tokens in the pipeline, so that
10242 # an alternate source of lines can be written in the correct order
10245 destroy_one_line_block();
10246 output_line_to_go();
10247 Perl::Tidy::VerticalAligner::flush();
10250 sub reset_block_text_accumulator {
10252 # save text after 'if' and 'elsif' to append after 'else'
10253 if ($accumulating_text_for_block) {
10255 if ( $accumulating_text_for_block =~ /^(if|elsif)$/ ) {
10256 push @{$rleading_block_if_elsif_text}, $leading_block_text;
10259 $accumulating_text_for_block = "";
10260 $leading_block_text = "";
10261 $leading_block_text_level = 0;
10262 $leading_block_text_length_exceeded = 0;
10263 $leading_block_text_line_number = 0;
10264 $leading_block_text_line_length = 0;
10267 sub set_block_text_accumulator {
10269 $accumulating_text_for_block = $tokens_to_go[$i];
10270 if ( $accumulating_text_for_block !~ /^els/ ) {
10271 $rleading_block_if_elsif_text = [];
10273 $leading_block_text = "";
10274 $leading_block_text_level = $levels_to_go[$i];
10275 $leading_block_text_line_number =
10276 $vertical_aligner_object->get_output_line_number();
10277 $leading_block_text_length_exceeded = 0;
10279 # this will contain the column number of the last character
10280 # of the closing side comment
10281 $leading_block_text_line_length =
10282 length($accumulating_text_for_block) +
10283 length( $rOpts->{'closing-side-comment-prefix'} ) +
10284 $leading_block_text_level * $rOpts_indent_columns + 3;
10287 sub accumulate_block_text {
10290 # accumulate leading text for -csc, ignoring any side comments
10291 if ( $accumulating_text_for_block
10292 && !$leading_block_text_length_exceeded
10293 && $types_to_go[$i] ne '#' )
10296 my $added_length = length( $tokens_to_go[$i] );
10297 $added_length += 1 if $i == 0;
10298 my $new_line_length = $leading_block_text_line_length + $added_length;
10300 # we can add this text if we don't exceed some limits..
10303 # we must not have already exceeded the text length limit
10304 length($leading_block_text) <
10305 $rOpts_closing_side_comment_maximum_text
10308 # the new total line length must be below the line length limit
10309 # or the new length must be below the text length limit
10310 # (ie, we may allow one token to exceed the text length limit)
10311 && ( $new_line_length < $rOpts_maximum_line_length
10312 || length($leading_block_text) + $added_length <
10313 $rOpts_closing_side_comment_maximum_text )
10315 # UNLESS: we are adding a closing paren before the brace we seek.
10316 # This is an attempt to avoid situations where the ... to be
10317 # added are longer than the omitted right paren, as in:
10319 # foreach my $item (@a_rather_long_variable_name_here) {
10321 # } ## end foreach my $item (@a_rather_long_variable_name_here...
10324 $tokens_to_go[$i] eq ')'
10327 $i + 1 <= $max_index_to_go
10328 && $block_type_to_go[ $i + 1 ] eq
10329 $accumulating_text_for_block
10331 || ( $i + 2 <= $max_index_to_go
10332 && $block_type_to_go[ $i + 2 ] eq
10333 $accumulating_text_for_block )
10339 # add an extra space at each newline
10340 if ( $i == 0 ) { $leading_block_text .= ' ' }
10342 # add the token text
10343 $leading_block_text .= $tokens_to_go[$i];
10344 $leading_block_text_line_length = $new_line_length;
10347 # show that text was truncated if necessary
10348 elsif ( $types_to_go[$i] ne 'b' ) {
10349 $leading_block_text_length_exceeded = 1;
10350 $leading_block_text .= '...';
10356 my %is_if_elsif_else_unless_while_until_for_foreach;
10360 # These block types may have text between the keyword and opening
10361 # curly. Note: 'else' does not, but must be included to allow trailing
10362 # if/elsif text to be appended.
10363 # patch for SWITCH/CASE: added 'case' and 'when'
10364 @_ = qw(if elsif else unless while until for foreach case when);
10365 @is_if_elsif_else_unless_while_until_for_foreach{@_} = (1) x scalar(@_);
10368 sub accumulate_csc_text {
10370 # called once per output buffer when -csc is used. Accumulates
10371 # the text placed after certain closing block braces.
10372 # Defines and returns the following for this buffer:
10374 my $block_leading_text = ""; # the leading text of the last '}'
10375 my $rblock_leading_if_elsif_text;
10376 my $i_block_leading_text =
10377 -1; # index of token owning block_leading_text
10378 my $block_line_count = 100; # how many lines the block spans
10379 my $terminal_type = 'b'; # type of last nonblank token
10380 my $i_terminal = 0; # index of last nonblank token
10381 my $terminal_block_type = "";
10383 for my $i ( 0 .. $max_index_to_go ) {
10384 my $type = $types_to_go[$i];
10385 my $block_type = $block_type_to_go[$i];
10386 my $token = $tokens_to_go[$i];
10388 # remember last nonblank token type
10389 if ( $type ne '#' && $type ne 'b' ) {
10390 $terminal_type = $type;
10391 $terminal_block_type = $block_type;
10395 my $type_sequence = $type_sequence_to_go[$i];
10396 if ( $block_type && $type_sequence ) {
10398 if ( $token eq '}' ) {
10400 # restore any leading text saved when we entered this block
10401 if ( defined( $block_leading_text{$type_sequence} ) ) {
10402 ( $block_leading_text, $rblock_leading_if_elsif_text ) =
10403 @{ $block_leading_text{$type_sequence} };
10404 $i_block_leading_text = $i;
10405 delete $block_leading_text{$type_sequence};
10406 $rleading_block_if_elsif_text =
10407 $rblock_leading_if_elsif_text;
10410 # if we run into a '}' then we probably started accumulating
10411 # at something like a trailing 'if' clause..no harm done.
10412 if ( $accumulating_text_for_block
10413 && $levels_to_go[$i] <= $leading_block_text_level )
10415 my $lev = $levels_to_go[$i];
10416 reset_block_text_accumulator();
10419 if ( defined( $block_opening_line_number{$type_sequence} ) )
10421 my $output_line_number =
10422 $vertical_aligner_object->get_output_line_number();
10423 $block_line_count =
10424 $output_line_number -
10425 $block_opening_line_number{$type_sequence} + 1;
10426 delete $block_opening_line_number{$type_sequence};
10430 # Error: block opening line undefined for this line..
10431 # This shouldn't be possible, but it is not a
10432 # significant problem.
10436 elsif ( $token eq '{' ) {
10439 $vertical_aligner_object->get_output_line_number();
10440 $block_opening_line_number{$type_sequence} = $line_number;
10442 if ( $accumulating_text_for_block
10443 && $levels_to_go[$i] == $leading_block_text_level )
10446 if ( $accumulating_text_for_block eq $block_type ) {
10448 # save any leading text before we enter this block
10449 $block_leading_text{$type_sequence} = [
10450 $leading_block_text,
10451 $rleading_block_if_elsif_text
10453 $block_opening_line_number{$type_sequence} =
10454 $leading_block_text_line_number;
10455 reset_block_text_accumulator();
10459 # shouldn't happen, but not a serious error.
10460 # We were accumulating -csc text for block type
10461 # $accumulating_text_for_block and unexpectedly
10462 # encountered a '{' for block type $block_type.
10469 && $csc_new_statement_ok
10470 && $is_if_elsif_else_unless_while_until_for_foreach{$token}
10471 && $token =~ /$closing_side_comment_list_pattern/o )
10473 set_block_text_accumulator($i);
10477 # note: ignoring type 'q' because of tricks being played
10478 # with 'q' for hanging side comments
10479 if ( $type ne 'b' && $type ne '#' && $type ne 'q' ) {
10480 $csc_new_statement_ok =
10481 ( $block_type || $type eq 'J' || $type eq ';' );
10484 && $accumulating_text_for_block
10485 && $levels_to_go[$i] == $leading_block_text_level )
10487 reset_block_text_accumulator();
10490 accumulate_block_text($i);
10495 # Treat an 'else' block specially by adding preceding 'if' and
10496 # 'elsif' text. Otherwise, the 'end else' is not helpful,
10497 # especially for cuddled-else formatting.
10498 if ( $terminal_block_type =~ /^els/ && $rblock_leading_if_elsif_text ) {
10499 $block_leading_text =
10500 make_else_csc_text( $i_terminal, $terminal_block_type,
10501 $block_leading_text, $rblock_leading_if_elsif_text );
10504 return ( $terminal_type, $i_terminal, $i_block_leading_text,
10505 $block_leading_text, $block_line_count );
10509 sub make_else_csc_text {
10511 # create additional -csc text for an 'else' and optionally 'elsif',
10512 # depending on the value of switch
10513 # $rOpts_closing_side_comment_else_flag:
10515 # = 0 add 'if' text to trailing else
10516 # = 1 same as 0 plus:
10517 # add 'if' to 'elsif's if can fit in line length
10518 # add last 'elsif' to trailing else if can fit in one line
10519 # = 2 same as 1 but do not check if exceed line length
10521 # $rif_elsif_text = a reference to a list of all previous closing
10522 # side comments created for this if block
10524 my ( $i_terminal, $block_type, $block_leading_text, $rif_elsif_text ) = @_;
10525 my $csc_text = $block_leading_text;
10527 if ( $block_type eq 'elsif' && $rOpts_closing_side_comment_else_flag == 0 )
10532 my $count = @{$rif_elsif_text};
10533 return $csc_text unless ($count);
10535 my $if_text = '[ if' . $rif_elsif_text->[0];
10537 # always show the leading 'if' text on 'else'
10538 if ( $block_type eq 'else' ) {
10539 $csc_text .= $if_text;
10542 # see if that's all
10543 if ( $rOpts_closing_side_comment_else_flag == 0 ) {
10547 my $last_elsif_text = "";
10548 if ( $count > 1 ) {
10549 $last_elsif_text = ' [elsif' . $rif_elsif_text->[ $count - 1 ];
10550 if ( $count > 2 ) { $last_elsif_text = ' [...' . $last_elsif_text; }
10553 # tentatively append one more item
10554 my $saved_text = $csc_text;
10555 if ( $block_type eq 'else' ) {
10556 $csc_text .= $last_elsif_text;
10559 $csc_text .= ' ' . $if_text;
10562 # all done if no length checks requested
10563 if ( $rOpts_closing_side_comment_else_flag == 2 ) {
10567 # undo it if line length exceeded
10569 length($csc_text) +
10570 length($block_type) +
10571 length( $rOpts->{'closing-side-comment-prefix'} ) +
10572 $levels_to_go[$i_terminal] * $rOpts_indent_columns + 3;
10573 if ( $length > $rOpts_maximum_line_length ) {
10574 $csc_text = $saved_text;
10579 sub add_closing_side_comment {
10581 # add closing side comments after closing block braces if -csc used
10582 my $cscw_block_comment;
10584 #---------------------------------------------------------------
10585 # Step 1: loop through all tokens of this line to accumulate
10586 # the text needed to create the closing side comments. Also see
10587 # how the line ends.
10588 #---------------------------------------------------------------
10590 my ( $terminal_type, $i_terminal, $i_block_leading_text,
10591 $block_leading_text, $block_line_count )
10592 = accumulate_csc_text();
10594 #---------------------------------------------------------------
10595 # Step 2: make the closing side comment if this ends a block
10596 #---------------------------------------------------------------
10597 my $have_side_comment = $i_terminal != $max_index_to_go;
10599 # if this line might end in a block closure..
10601 $terminal_type eq '}'
10606 # the block is long enough
10607 ( $block_line_count >= $rOpts->{'closing-side-comment-interval'} )
10609 # or there is an existing comment to check
10610 || ( $have_side_comment
10611 && $rOpts->{'closing-side-comment-warnings'} )
10614 # .. and if this is one of the types of interest
10615 && $block_type_to_go[$i_terminal] =~
10616 /$closing_side_comment_list_pattern/o
10618 # .. but not an anonymous sub
10619 # These are not normally of interest, and their closing braces are
10620 # often followed by commas or semicolons anyway. This also avoids
10621 # possible erratic output due to line numbering inconsistencies
10622 # in the cases where their closing braces terminate a line.
10623 && $block_type_to_go[$i_terminal] ne 'sub'
10625 # ..and the corresponding opening brace must is not in this batch
10626 # (because we do not need to tag one-line blocks, although this
10627 # should also be caught with a positive -csci value)
10628 && $mate_index_to_go[$i_terminal] < 0
10633 # this is the last token (line doesnt have a side comment)
10634 !$have_side_comment
10636 # or the old side comment is a closing side comment
10637 || $tokens_to_go[$max_index_to_go] =~
10638 /$closing_side_comment_prefix_pattern/o
10643 # then make the closing side comment text
10645 "$rOpts->{'closing-side-comment-prefix'} $block_type_to_go[$i_terminal]";
10647 # append any extra descriptive text collected above
10648 if ( $i_block_leading_text == $i_terminal ) {
10649 $token .= $block_leading_text;
10651 $token =~ s/\s*$//; # trim any trailing whitespace
10653 # handle case of existing closing side comment
10654 if ($have_side_comment) {
10656 # warn if requested and tokens differ significantly
10657 if ( $rOpts->{'closing-side-comment-warnings'} ) {
10658 my $old_csc = $tokens_to_go[$max_index_to_go];
10659 my $new_csc = $token;
10660 $new_csc =~ s/(\.\.\.)\s*$//; # trim trailing '...'
10661 my $new_trailing_dots = $1;
10662 $old_csc =~ s/\.\.\.\s*$//;
10663 $new_csc =~ s/\s+//g; # trim all whitespace
10664 $old_csc =~ s/\s+//g;
10666 # Patch to handle multiple closing side comments at
10667 # else and elsif's. These have become too complicated
10668 # to check, so if we see an indication of
10669 # '[ if' or '[ # elsif', then assume they were made
10671 if ( $block_type_to_go[$i_terminal] eq 'else' ) {
10672 if ( $old_csc =~ /\[\s*elsif/ ) { $old_csc = $new_csc }
10674 elsif ( $block_type_to_go[$i_terminal] eq 'elsif' ) {
10675 if ( $old_csc =~ /\[\s*if/ ) { $old_csc = $new_csc }
10678 # if old comment is contained in new comment,
10679 # only compare the common part.
10680 if ( length($new_csc) > length($old_csc) ) {
10681 $new_csc = substr( $new_csc, 0, length($old_csc) );
10684 # if the new comment is shorter and has been limited,
10685 # only compare the common part.
10686 if ( length($new_csc) < length($old_csc) && $new_trailing_dots )
10688 $old_csc = substr( $old_csc, 0, length($new_csc) );
10691 # any remaining difference?
10692 if ( $new_csc ne $old_csc ) {
10694 # just leave the old comment if we are below the threshold
10695 # for creating side comments
10696 if ( $block_line_count <
10697 $rOpts->{'closing-side-comment-interval'} )
10702 # otherwise we'll make a note of it
10706 "perltidy -cscw replaced: $tokens_to_go[$max_index_to_go]\n"
10709 # save the old side comment in a new trailing block comment
10710 my ( $day, $month, $year ) = (localtime)[ 3, 4, 5 ];
10713 $cscw_block_comment =
10714 "## perltidy -cscw $year-$month-$day: $tokens_to_go[$max_index_to_go]";
10719 # No differences.. we can safely delete old comment if we
10720 # are below the threshold
10721 if ( $block_line_count <
10722 $rOpts->{'closing-side-comment-interval'} )
10725 unstore_token_to_go()
10726 if ( $types_to_go[$max_index_to_go] eq '#' );
10727 unstore_token_to_go()
10728 if ( $types_to_go[$max_index_to_go] eq 'b' );
10733 # switch to the new csc (unless we deleted it!)
10734 $tokens_to_go[$max_index_to_go] = $token if $token;
10737 # handle case of NO existing closing side comment
10740 # insert the new side comment into the output token stream
10742 my $block_type = '';
10743 my $type_sequence = '';
10744 my $container_environment =
10745 $container_environment_to_go[$max_index_to_go];
10746 my $level = $levels_to_go[$max_index_to_go];
10747 my $slevel = $nesting_depth_to_go[$max_index_to_go];
10748 my $no_internal_newlines = 0;
10750 my $nesting_blocks = $nesting_blocks_to_go[$max_index_to_go];
10751 my $ci_level = $ci_levels_to_go[$max_index_to_go];
10752 my $in_continued_quote = 0;
10754 # first insert a blank token
10755 insert_new_token_to_go( ' ', 'b', $slevel, $no_internal_newlines );
10757 # then the side comment
10758 insert_new_token_to_go( $token, $type, $slevel,
10759 $no_internal_newlines );
10762 return $cscw_block_comment;
10765 sub previous_nonblank_token {
10770 elsif ( $types_to_go[ $i - 1 ] ne 'b' ) {
10771 return $tokens_to_go[ $i - 1 ];
10774 return $tokens_to_go[ $i - 2 ];
10781 sub send_lines_to_vertical_aligner {
10783 my ( $ri_first, $ri_last, $do_not_pad ) = @_;
10785 my $rindentation_list = [0]; # ref to indentations for each line
10787 # define the array @matching_token_to_go for the output tokens
10788 # which will be non-blank for each special token (such as =>)
10789 # for which alignment is required.
10790 set_vertical_alignment_markers( $ri_first, $ri_last );
10792 # flush if necessary to avoid unwanted alignment
10793 my $must_flush = 0;
10794 if ( @$ri_first > 1 ) {
10796 # flush before a long if statement
10797 if ( $types_to_go[0] eq 'k' && $tokens_to_go[0] =~ /^(if|unless)$/ ) {
10802 Perl::Tidy::VerticalAligner::flush();
10805 set_logical_padding( $ri_first, $ri_last );
10807 # loop to prepare each line for shipment
10808 my $n_last_line = @$ri_first - 1;
10810 for my $n ( 0 .. $n_last_line ) {
10811 my $ibeg = $$ri_first[$n];
10812 my $iend = $$ri_last[$n];
10817 my $i_start = $ibeg;
10821 my @container_name = ("");
10822 my @multiple_comma_arrows = (undef);
10824 my $j = 0; # field index
10827 for $i ( $ibeg .. $iend ) {
10829 # Keep track of containers balanced on this line only.
10830 # These are used below to prevent unwanted cross-line alignments.
10831 # Unbalanced containers already avoid aligning across
10832 # container boundaries.
10833 if ( $tokens_to_go[$i] eq '(' ) {
10834 my $i_mate = $mate_index_to_go[$i];
10835 if ( $i_mate > $i && $i_mate <= $iend ) {
10837 my $seqno = $type_sequence_to_go[$i];
10838 my $count = comma_arrow_count($seqno);
10839 $multiple_comma_arrows[$depth] = $count && $count > 1;
10840 my $name = previous_nonblank_token($i);
10842 $container_name[$depth] = "+" . $name;
10845 elsif ( $tokens_to_go[$i] eq ')' ) {
10846 $depth-- if $depth > 0;
10849 # if we find a new synchronization token, we are done with
10851 if ( $i > $i_start && $matching_token_to_go[$i] ne '' ) {
10853 my $tok = my $raw_tok = $matching_token_to_go[$i];
10855 # make separators in different nesting depths unique
10856 # by appending the nesting depth digit.
10857 if ( $raw_tok ne '#' ) {
10858 $tok .= "$nesting_depth_to_go[$i]";
10861 # do any special decorations for commas to avoid unwanted
10862 # cross-line alignments.
10863 if ( $raw_tok eq ',' ) {
10864 if ( $container_name[$depth] ) {
10865 $tok .= $container_name[$depth];
10869 # decorate '=>' with:
10870 # - Nothing if this container is unbalanced on this line.
10871 # - The previous token if it is balanced and multiple '=>'s
10872 # - The container name if it is bananced and no other '=>'s
10873 elsif ( $raw_tok eq '=>' ) {
10874 if ( $container_name[$depth] ) {
10875 if ( $multiple_comma_arrows[$depth] ) {
10876 $tok .= "+" . previous_nonblank_token($i);
10879 $tok .= $container_name[$depth];
10884 # concatenate the text of the consecutive tokens to form
10887 join( '', @tokens_to_go[ $i_start .. $i - 1 ] ) );
10889 # store the alignment token for this field
10890 push( @tokens, $tok );
10892 # get ready for the next batch
10895 $patterns[$j] = "";
10898 # continue accumulating tokens
10899 # handle non-keywords..
10900 if ( $types_to_go[$i] ne 'k' ) {
10901 my $type = $types_to_go[$i];
10903 # Mark most things before arrows as a quote to
10904 # get them to line up. Testfile: mixed.pl.
10905 if ( ( $i < $iend - 1 ) && ( $type =~ /^[wnC]$/ ) ) {
10906 my $next_type = $types_to_go[ $i + 1 ];
10907 my $i_next_nonblank =
10908 ( ( $next_type eq 'b' ) ? $i + 2 : $i + 1 );
10910 if ( $types_to_go[$i_next_nonblank] eq '=>' ) {
10915 # minor patch to make numbers and quotes align
10916 if ( $type eq 'n' ) { $type = 'Q' }
10918 $patterns[$j] .= $type;
10921 # for keywords we have to use the actual text
10924 # map certain keywords to the same 'if' class to align
10925 # long if/elsif sequences. my testfile: elsif.pl
10926 my $tok = $tokens_to_go[$i];
10927 if ( $n == 0 && $tok =~ /^(elsif|else|unless)$/ ) {
10930 $patterns[$j] .= $tok;
10934 # done with this line .. join text of tokens to make the last field
10935 push( @fields, join( '', @tokens_to_go[ $i_start .. $iend ] ) );
10937 my ( $indentation, $lev, $level_end, $terminal_type,
10938 $is_semicolon_terminated, $is_outdented_line )
10939 = set_adjusted_indentation( $ibeg, $iend, \@fields, \@patterns,
10940 $ri_first, $ri_last, $rindentation_list );
10942 # we will allow outdenting of long lines..
10943 my $outdent_long_lines = (
10945 # which are long quotes, if allowed
10946 ( $types_to_go[$ibeg] eq 'Q' && $rOpts->{'outdent-long-quotes'} )
10948 # which are long block comments, if allowed
10950 $types_to_go[$ibeg] eq '#'
10951 && $rOpts->{'outdent-long-comments'}
10953 # but not if this is a static block comment
10954 && !$is_static_block_comment
10959 $nesting_depth_to_go[ $iend + 1 ] - $nesting_depth_to_go[$ibeg];
10961 my $rvertical_tightness_flags =
10962 set_vertical_tightness_flags( $n, $n_last_line, $ibeg, $iend,
10963 $ri_first, $ri_last );
10965 # flush an outdented line to avoid any unwanted vertical alignment
10966 Perl::Tidy::VerticalAligner::flush() if ($is_outdented_line);
10968 my $is_terminal_ternary = 0;
10969 if ( $tokens_to_go[$ibeg] eq ':'
10970 || $n > 0 && $tokens_to_go[ $$ri_last[ $n - 1 ] ] eq ':' )
10972 if ( ( $terminal_type eq ';' && $level_end <= $lev )
10973 || ( $level_end < $lev ) )
10975 $is_terminal_ternary = 1;
10979 # send this new line down the pipe
10980 my $forced_breakpoint = $forced_breakpoint_to_go[$iend];
10981 Perl::Tidy::VerticalAligner::append_line(
10988 $forced_breakpoint_to_go[$iend] || $in_comma_list,
10989 $outdent_long_lines,
10990 $is_terminal_ternary,
10991 $is_semicolon_terminated,
10993 $rvertical_tightness_flags,
10997 $tokens_to_go[$iend] eq ',' && $forced_breakpoint_to_go[$iend];
10999 # flush an outdented line to avoid any unwanted vertical alignment
11000 Perl::Tidy::VerticalAligner::flush() if ($is_outdented_line);
11004 } # end of loop to output each line
11006 # remember indentation of lines containing opening containers for
11007 # later use by sub set_adjusted_indentation
11008 save_opening_indentation( $ri_first, $ri_last, $rindentation_list );
11011 { # begin unmatched_indexes
11013 # closure to keep track of unbalanced containers.
11014 # arrays shared by the routines in this block:
11015 my @unmatched_opening_indexes_in_this_batch;
11016 my @unmatched_closing_indexes_in_this_batch;
11017 my %comma_arrow_count;
11019 sub is_unbalanced_batch {
11020 @unmatched_opening_indexes_in_this_batch +
11021 @unmatched_closing_indexes_in_this_batch;
11024 sub comma_arrow_count {
11026 return $comma_arrow_count{$seqno};
11029 sub match_opening_and_closing_tokens {
11031 # Match up indexes of opening and closing braces, etc, in this batch.
11032 # This has to be done after all tokens are stored because unstoring
11033 # of tokens would otherwise cause trouble.
11035 @unmatched_opening_indexes_in_this_batch = ();
11036 @unmatched_closing_indexes_in_this_batch = ();
11037 %comma_arrow_count = ();
11039 my ( $i, $i_mate, $token );
11040 foreach $i ( 0 .. $max_index_to_go ) {
11041 if ( $type_sequence_to_go[$i] ) {
11042 $token = $tokens_to_go[$i];
11043 if ( $token =~ /^[\(\[\{\?]$/ ) {
11044 push @unmatched_opening_indexes_in_this_batch, $i;
11046 elsif ( $token =~ /^[\)\]\}\:]$/ ) {
11048 $i_mate = pop @unmatched_opening_indexes_in_this_batch;
11049 if ( defined($i_mate) && $i_mate >= 0 ) {
11050 if ( $type_sequence_to_go[$i_mate] ==
11051 $type_sequence_to_go[$i] )
11053 $mate_index_to_go[$i] = $i_mate;
11054 $mate_index_to_go[$i_mate] = $i;
11057 push @unmatched_opening_indexes_in_this_batch,
11059 push @unmatched_closing_indexes_in_this_batch, $i;
11063 push @unmatched_closing_indexes_in_this_batch, $i;
11067 elsif ( $tokens_to_go[$i] eq '=>' ) {
11068 if (@unmatched_opening_indexes_in_this_batch) {
11069 my $j = $unmatched_opening_indexes_in_this_batch[-1];
11070 my $seqno = $type_sequence_to_go[$j];
11071 $comma_arrow_count{$seqno}++;
11077 sub save_opening_indentation {
11079 # This should be called after each batch of tokens is output. It
11080 # saves indentations of lines of all unmatched opening tokens.
11081 # These will be used by sub get_opening_indentation.
11083 my ( $ri_first, $ri_last, $rindentation_list ) = @_;
11085 # we no longer need indentations of any saved indentations which
11086 # are unmatched closing tokens in this batch, because we will
11087 # never encounter them again. So we can delete them to keep
11088 # the hash size down.
11089 foreach (@unmatched_closing_indexes_in_this_batch) {
11090 my $seqno = $type_sequence_to_go[$_];
11091 delete $saved_opening_indentation{$seqno};
11094 # we need to save indentations of any unmatched opening tokens
11095 # in this batch because we may need them in a subsequent batch.
11096 foreach (@unmatched_opening_indexes_in_this_batch) {
11097 my $seqno = $type_sequence_to_go[$_];
11098 $saved_opening_indentation{$seqno} = [
11099 lookup_opening_indentation(
11100 $_, $ri_first, $ri_last, $rindentation_list
11105 } # end unmatched_indexes
11107 sub get_opening_indentation {
11109 # get the indentation of the line which output the opening token
11110 # corresponding to a given closing token in the current output batch.
11113 # $i_closing - index in this line of a closing token ')' '}' or ']'
11115 # $ri_first - reference to list of the first index $i for each output
11116 # line in this batch
11117 # $ri_last - reference to list of the last index $i for each output line
11119 # $rindentation_list - reference to a list containing the indentation
11120 # used for each line.
11123 # -the indentation of the line which contained the opening token
11124 # which matches the token at index $i_opening
11125 # -and its offset (number of columns) from the start of the line
11127 my ( $i_closing, $ri_first, $ri_last, $rindentation_list ) = @_;
11129 # first, see if the opening token is in the current batch
11130 my $i_opening = $mate_index_to_go[$i_closing];
11131 my ( $indent, $offset );
11132 if ( $i_opening >= 0 ) {
11134 # it is..look up the indentation
11135 ( $indent, $offset ) =
11136 lookup_opening_indentation( $i_opening, $ri_first, $ri_last,
11137 $rindentation_list );
11140 # if not, it should have been stored in the hash by a previous batch
11142 my $seqno = $type_sequence_to_go[$i_closing];
11144 if ( $saved_opening_indentation{$seqno} ) {
11145 ( $indent, $offset ) = @{ $saved_opening_indentation{$seqno} };
11148 # some kind of serious error
11149 # (example is badfile.t)
11156 # if no sequence number it must be an unbalanced container
11162 return ( $indent, $offset );
11165 sub lookup_opening_indentation {
11167 # get the indentation of the line in the current output batch
11168 # which output a selected opening token
11171 # $i_opening - index of an opening token in the current output batch
11172 # whose line indentation we need
11173 # $ri_first - reference to list of the first index $i for each output
11174 # line in this batch
11175 # $ri_last - reference to list of the last index $i for each output line
11177 # $rindentation_list - reference to a list containing the indentation
11178 # used for each line. (NOTE: the first slot in
11179 # this list is the last returned line number, and this is
11180 # followed by the list of indentations).
11183 # -the indentation of the line which contained token $i_opening
11184 # -and its offset (number of columns) from the start of the line
11186 my ( $i_opening, $ri_start, $ri_last, $rindentation_list ) = @_;
11188 my $nline = $rindentation_list->[0]; # line number of previous lookup
11190 # reset line location if necessary
11191 $nline = 0 if ( $i_opening < $ri_start->[$nline] );
11193 # find the correct line
11194 unless ( $i_opening > $ri_last->[-1] ) {
11195 while ( $i_opening > $ri_last->[$nline] ) { $nline++; }
11198 # error - token index is out of bounds - shouldn't happen
11201 "non-fatal program bug in lookup_opening_indentation - index out of range\n"
11203 report_definite_bug();
11204 $nline = $#{$ri_last};
11207 $rindentation_list->[0] =
11208 $nline; # save line number to start looking next call
11209 my $ibeg = $ri_start->[$nline];
11210 my $offset = token_sequence_length( $ibeg, $i_opening ) - 1;
11211 return ( $rindentation_list->[ $nline + 1 ], $offset );
11215 my %is_if_elsif_else_unless_while_until_for_foreach;
11219 # These block types may have text between the keyword and opening
11220 # curly. Note: 'else' does not, but must be included to allow trailing
11221 # if/elsif text to be appended.
11222 # patch for SWITCH/CASE: added 'case' and 'when'
11223 @_ = qw(if elsif else unless while until for foreach case when);
11224 @is_if_elsif_else_unless_while_until_for_foreach{@_} = (1) x scalar(@_);
11227 sub set_adjusted_indentation {
11229 # This routine has the final say regarding the actual indentation of
11230 # a line. It starts with the basic indentation which has been
11231 # defined for the leading token, and then takes into account any
11232 # options that the user has set regarding special indenting and
11235 my ( $ibeg, $iend, $rfields, $rpatterns, $ri_first, $ri_last,
11236 $rindentation_list )
11239 # we need to know the last token of this line
11240 my ( $terminal_type, $i_terminal ) =
11241 terminal_type( \@types_to_go, \@block_type_to_go, $ibeg, $iend );
11243 my $is_outdented_line = 0;
11245 my $is_semicolon_terminated = $terminal_type eq ';'
11246 && $nesting_depth_to_go[$iend] < $nesting_depth_to_go[$ibeg];
11248 ##########################################################
11249 # Section 1: set a flag and a default indentation
11251 # Most lines are indented according to the initial token.
11252 # But it is common to outdent to the level just after the
11253 # terminal token in certain cases...
11254 # adjust_indentation flag:
11255 # 0 - do not adjust
11257 # 2 - vertically align with opening token
11259 ##########################################################
11260 my $adjust_indentation = 0;
11261 my $default_adjust_indentation = $adjust_indentation;
11263 my ( $opening_indentation, $opening_offset );
11265 # if we are at a closing token of some type..
11266 if ( $types_to_go[$ibeg] =~ /^[\)\}\]]$/ ) {
11268 # get the indentation of the line containing the corresponding
11270 ( $opening_indentation, $opening_offset ) =
11271 get_opening_indentation( $ibeg, $ri_first, $ri_last,
11272 $rindentation_list );
11274 # First set the default behavior:
11275 # default behavior is to outdent closing lines
11276 # of the form: "); }; ]; )->xxx;"
11278 $is_semicolon_terminated
11280 # and 'cuddled parens' of the form: ")->pack("
11282 $terminal_type eq '('
11283 && $types_to_go[$ibeg] eq ')'
11284 && ( $nesting_depth_to_go[$iend] + 1 ==
11285 $nesting_depth_to_go[$ibeg] )
11289 $adjust_indentation = 1;
11292 # TESTING: outdent something like '),'
11294 $terminal_type eq ','
11296 # allow just one character before the comma
11297 && $i_terminal == $ibeg + 1
11299 # requre LIST environment; otherwise, we may outdent too much --
11300 # this can happen in calls without parentheses (overload.t);
11301 && $container_environment_to_go[$i_terminal] eq 'LIST'
11304 $adjust_indentation = 1;
11307 # undo continuation indentation of a terminal closing token if
11308 # it is the last token before a level decrease. This will allow
11309 # a closing token to line up with its opening counterpart, and
11310 # avoids a indentation jump larger than 1 level.
11311 if ( $types_to_go[$i_terminal] =~ /^[\}\]\)R]$/
11312 && $i_terminal == $ibeg )
11314 my $ci = $ci_levels_to_go[$ibeg];
11315 my $lev = $levels_to_go[$ibeg];
11316 my $next_type = $types_to_go[ $ibeg + 1 ];
11317 my $i_next_nonblank =
11318 ( ( $next_type eq 'b' ) ? $ibeg + 2 : $ibeg + 1 );
11319 if ( $i_next_nonblank <= $max_index_to_go
11320 && $levels_to_go[$i_next_nonblank] < $lev )
11322 $adjust_indentation = 1;
11326 $default_adjust_indentation = $adjust_indentation;
11328 # Now modify default behavior according to user request:
11329 # handle option to indent non-blocks of the form ); }; ];
11330 # But don't do special indentation to something like ')->pack('
11331 if ( !$block_type_to_go[$ibeg] ) {
11332 my $cti = $closing_token_indentation{ $tokens_to_go[$ibeg] };
11334 if ( $i_terminal <= $ibeg + 1
11335 || $is_semicolon_terminated )
11337 $adjust_indentation = 2;
11340 $adjust_indentation = 0;
11343 elsif ( $cti == 2 ) {
11344 if ($is_semicolon_terminated) {
11345 $adjust_indentation = 3;
11348 $adjust_indentation = 0;
11351 elsif ( $cti == 3 ) {
11352 $adjust_indentation = 3;
11356 # handle option to indent blocks
11359 $rOpts->{'indent-closing-brace'}
11361 $i_terminal == $ibeg # isolated terminal '}'
11362 || $is_semicolon_terminated
11366 $adjust_indentation = 3;
11371 # if at ');', '};', '>;', and '];' of a terminal qw quote
11372 elsif ($$rpatterns[0] =~ /^qb*;$/
11373 && $$rfields[0] =~ /^([\)\}\]\>]);$/ )
11375 if ( $closing_token_indentation{$1} == 0 ) {
11376 $adjust_indentation = 1;
11379 $adjust_indentation = 3;
11383 ##########################################################
11384 # Section 2: set indentation according to flag set above
11386 # Select the indentation object to define leading
11387 # whitespace. If we are outdenting something like '} } );'
11388 # then we want to use one level below the last token
11389 # ($i_terminal) in order to get it to fully outdent through
11391 ##########################################################
11394 my $level_end = $levels_to_go[$iend];
11396 if ( $adjust_indentation == 0 ) {
11397 $indentation = $leading_spaces_to_go[$ibeg];
11398 $lev = $levels_to_go[$ibeg];
11400 elsif ( $adjust_indentation == 1 ) {
11401 $indentation = $reduced_spaces_to_go[$i_terminal];
11402 $lev = $levels_to_go[$i_terminal];
11405 # handle indented closing token which aligns with opening token
11406 elsif ( $adjust_indentation == 2 ) {
11408 # handle option to align closing token with opening token
11409 $lev = $levels_to_go[$ibeg];
11411 # calculate spaces needed to align with opening token
11413 get_SPACES($opening_indentation) + $opening_offset;
11415 # Indent less than the previous line.
11417 # Problem: For -lp we don't exactly know what it was if there
11418 # were recoverable spaces sent to the aligner. A good solution
11419 # would be to force a flush of the vertical alignment buffer, so
11420 # that we would know. For now, this rule is used for -lp:
11422 # When the last line did not start with a closing token we will
11423 # be optimistic that the aligner will recover everything wanted.
11425 # This rule will prevent us from breaking a hierarchy of closing
11426 # tokens, and in a worst case will leave a closing paren too far
11427 # indented, but this is better than frequently leaving it not
11429 my $last_spaces = get_SPACES($last_indentation_written);
11430 if ( $last_leading_token !~ /^[\}\]\)]$/ ) {
11432 get_RECOVERABLE_SPACES($last_indentation_written);
11435 # reset the indentation to the new space count if it works
11436 # only options are all or none: nothing in-between looks good
11437 $lev = $levels_to_go[$ibeg];
11438 if ( $space_count < $last_spaces ) {
11439 if ($rOpts_line_up_parentheses) {
11440 my $lev = $levels_to_go[$ibeg];
11442 new_lp_indentation_item( $space_count, $lev, 0, 0, 0 );
11445 $indentation = $space_count;
11449 # revert to default if it doesnt work
11451 $space_count = leading_spaces_to_go($ibeg);
11452 if ( $default_adjust_indentation == 0 ) {
11453 $indentation = $leading_spaces_to_go[$ibeg];
11455 elsif ( $default_adjust_indentation == 1 ) {
11456 $indentation = $reduced_spaces_to_go[$i_terminal];
11457 $lev = $levels_to_go[$i_terminal];
11462 # Full indentaion of closing tokens (-icb and -icp or -cti=2)
11465 # handle -icb (indented closing code block braces)
11466 # Updated method for indented block braces: indent one full level if
11467 # there is no continuation indentation. This will occur for major
11468 # structures such as sub, if, else, but not for things like map
11471 # Note: only code blocks without continuation indentation are
11472 # handled here (if, else, unless, ..). In the following snippet,
11473 # the terminal brace of the sort block will have continuation
11474 # indentation as shown so it will not be handled by the coding
11475 # here. We would have to undo the continuation indentation to do
11476 # this, but it probably looks ok as is. This is a possible future
11477 # update for semicolon terminated lines.
11479 # if ($sortby eq 'date' or $sortby eq 'size') {
11481 # $file_data{$a}{$sortby} <=> $file_data{$b}{$sortby}
11486 if ( $block_type_to_go[$ibeg]
11487 && $ci_levels_to_go[$i_terminal] == 0 )
11489 my $spaces = get_SPACES( $leading_spaces_to_go[$i_terminal] );
11490 $indentation = $spaces + $rOpts_indent_columns;
11492 # NOTE: for -lp we could create a new indentation object, but
11493 # there is probably no need to do it
11496 # handle -icp and any -icb block braces which fall through above
11497 # test such as the 'sort' block mentioned above.
11500 # There are currently two ways to handle -icp...
11501 # One way is to use the indentation of the previous line:
11502 # $indentation = $last_indentation_written;
11504 # The other way is to use the indentation that the previous line
11505 # would have had if it hadn't been adjusted:
11506 $indentation = $last_unadjusted_indentation;
11508 # Current method: use the minimum of the two. This avoids
11509 # inconsistent indentation.
11510 if ( get_SPACES($last_indentation_written) <
11511 get_SPACES($indentation) )
11513 $indentation = $last_indentation_written;
11517 # use previous indentation but use own level
11518 # to cause list to be flushed properly
11519 $lev = $levels_to_go[$ibeg];
11522 # remember indentation except for multi-line quotes, which get
11524 unless ( $ibeg == 0 && $starting_in_quote ) {
11525 $last_indentation_written = $indentation;
11526 $last_unadjusted_indentation = $leading_spaces_to_go[$ibeg];
11527 $last_leading_token = $tokens_to_go[$ibeg];
11530 # be sure lines with leading closing tokens are not outdented more
11531 # than the line which contained the corresponding opening token.
11533 #############################################################
11534 # updated per bug report in alex_bug.pl: we must not
11535 # mess with the indentation of closing logical braces so
11536 # we must treat something like '} else {' as if it were
11537 # an isolated brace my $is_isolated_block_brace = (
11538 # $iend == $ibeg ) && $block_type_to_go[$ibeg];
11539 my $is_isolated_block_brace = $block_type_to_go[$ibeg]
11540 && ( $iend == $ibeg
11541 || $is_if_elsif_else_unless_while_until_for_foreach{
11542 $block_type_to_go[$ibeg] } );
11543 #############################################################
11544 if ( !$is_isolated_block_brace && defined($opening_indentation) ) {
11545 if ( get_SPACES($opening_indentation) > get_SPACES($indentation) ) {
11546 $indentation = $opening_indentation;
11550 # remember the indentation of each line of this batch
11551 push @{$rindentation_list}, $indentation;
11553 # outdent lines with certain leading tokens...
11556 # must be first word of this batch
11562 # certain leading keywords if requested
11564 $rOpts->{'outdent-keywords'}
11565 && $types_to_go[$ibeg] eq 'k'
11566 && $outdent_keyword{ $tokens_to_go[$ibeg] }
11569 # or labels if requested
11570 || ( $rOpts->{'outdent-labels'} && $types_to_go[$ibeg] eq 'J' )
11572 # or static block comments if requested
11573 || ( $types_to_go[$ibeg] eq '#'
11574 && $rOpts->{'outdent-static-block-comments'}
11575 && $is_static_block_comment )
11580 my $space_count = leading_spaces_to_go($ibeg);
11581 if ( $space_count > 0 ) {
11582 $space_count -= $rOpts_continuation_indentation;
11583 $is_outdented_line = 1;
11584 if ( $space_count < 0 ) { $space_count = 0 }
11586 # do not promote a spaced static block comment to non-spaced;
11587 # this is not normally necessary but could be for some
11588 # unusual user inputs (such as -ci = -i)
11589 if ( $types_to_go[$ibeg] eq '#' && $space_count == 0 ) {
11593 if ($rOpts_line_up_parentheses) {
11595 new_lp_indentation_item( $space_count, $lev, 0, 0, 0 );
11598 $indentation = $space_count;
11603 return ( $indentation, $lev, $level_end, $terminal_type,
11604 $is_semicolon_terminated, $is_outdented_line );
11608 sub set_vertical_tightness_flags {
11610 my ( $n, $n_last_line, $ibeg, $iend, $ri_first, $ri_last ) = @_;
11612 # Define vertical tightness controls for the nth line of a batch.
11613 # We create an array of parameters which tell the vertical aligner
11614 # if we should combine this line with the next line to achieve the
11615 # desired vertical tightness. The array of parameters contains:
11617 # [0] type: 1=is opening tok 2=is closing tok 3=is opening block brace
11618 # [1] flag: if opening: 1=no multiple steps, 2=multiple steps ok
11619 # if closing: spaces of padding to use
11620 # [2] sequence number of container
11621 # [3] valid flag: do not append if this flag is false. Will be
11622 # true if appropriate -vt flag is set. Otherwise, Will be
11623 # made true only for 2 line container in parens with -lp
11625 # These flags are used by sub set_leading_whitespace in
11626 # the vertical aligner
11628 my $rvertical_tightness_flags = [ 0, 0, 0, 0, 0, 0 ];
11630 # For non-BLOCK tokens, we will need to examine the next line
11631 # too, so we won't consider the last line.
11632 if ( $n < $n_last_line ) {
11634 # see if last token is an opening token...not a BLOCK...
11635 my $ibeg_next = $$ri_first[ $n + 1 ];
11636 my $token_end = $tokens_to_go[$iend];
11637 my $iend_next = $$ri_last[ $n + 1 ];
11639 $type_sequence_to_go[$iend]
11640 && !$block_type_to_go[$iend]
11641 && $is_opening_token{$token_end}
11643 $opening_vertical_tightness{$token_end} > 0
11645 # allow 2-line method call to be closed up
11646 || ( $rOpts_line_up_parentheses
11647 && $token_end eq '('
11649 && $types_to_go[ $iend - 1 ] ne 'b' )
11654 # avoid multiple jumps in nesting depth in one line if
11656 my $ovt = $opening_vertical_tightness{$token_end};
11657 my $iend_next = $$ri_last[ $n + 1 ];
11660 && ( $nesting_depth_to_go[ $iend_next + 1 ] !=
11661 $nesting_depth_to_go[$ibeg_next] )
11665 # If -vt flag has not been set, mark this as invalid
11666 # and aligner will validate it if it sees the closing paren
11668 my $valid_flag = $ovt;
11669 @{$rvertical_tightness_flags} =
11670 ( 1, $ovt, $type_sequence_to_go[$iend], $valid_flag );
11674 # see if first token of next line is a closing token...
11675 # ..and be sure this line does not have a side comment
11676 my $token_next = $tokens_to_go[$ibeg_next];
11677 if ( $type_sequence_to_go[$ibeg_next]
11678 && !$block_type_to_go[$ibeg_next]
11679 && $is_closing_token{$token_next}
11680 && $types_to_go[$iend] !~ '#' ) # for safety, shouldn't happen!
11682 my $ovt = $opening_vertical_tightness{$token_next};
11683 my $cvt = $closing_vertical_tightness{$token_next};
11686 # never append a trailing line like )->pack(
11687 # because it will throw off later alignment
11689 $nesting_depth_to_go[$ibeg_next] ==
11690 $nesting_depth_to_go[ $iend_next + 1 ] + 1
11695 $container_environment_to_go[$ibeg_next] ne 'LIST'
11699 # allow closing up 2-line method calls
11700 || ( $rOpts_line_up_parentheses
11701 && $token_next eq ')' )
11708 # decide which trailing closing tokens to append..
11710 if ( $cvt == 2 || $iend_next == $ibeg_next ) { $ok = 1 }
11712 my $str = join( '',
11713 @types_to_go[ $ibeg_next + 1 .. $ibeg_next + 2 ] );
11715 # append closing token if followed by comment or ';'
11716 if ( $str =~ /^b?[#;]/ ) { $ok = 1 }
11720 my $valid_flag = $cvt;
11721 @{$rvertical_tightness_flags} = (
11723 $tightness{$token_next} == 2 ? 0 : 1,
11724 $type_sequence_to_go[$ibeg_next], $valid_flag,
11730 # Opening Token Right
11731 # If requested, move an isolated trailing opening token to the end of
11732 # the previous line which ended in a comma. We could do this
11733 # in sub recombine_breakpoints but that would cause problems
11734 # with -lp formatting. The problem is that indentation will
11735 # quickly move far to the right in nested expressions. By
11736 # doing it after indentation has been set, we avoid changes
11737 # to the indentation. Actual movement of the token takes place
11738 # in sub write_leader_and_string.
11740 $opening_token_right{ $tokens_to_go[$ibeg_next] }
11742 # previous line is not opening
11743 # (use -sot to combine with it)
11744 && !$is_opening_token{$token_end}
11746 # previous line ended in one of these
11747 # (add other cases if necessary; '=>' and '.' are not necessary
11748 ##&& ($is_opening_token{$token_end} || $token_end eq ',')
11749 && !$block_type_to_go[$ibeg_next]
11751 # this is a line with just an opening token
11752 && ( $iend_next == $ibeg_next
11753 || $iend_next == $ibeg_next + 2
11754 && $types_to_go[$iend_next] eq '#' )
11756 # looks bad if we align vertically with the wrong container
11757 && $tokens_to_go[$ibeg] ne $tokens_to_go[$ibeg_next]
11760 my $valid_flag = 1;
11761 my $spaces = ( $types_to_go[ $ibeg_next - 1 ] eq 'b' ) ? 1 : 0;
11762 @{$rvertical_tightness_flags} =
11763 ( 2, $spaces, $type_sequence_to_go[$ibeg_next], $valid_flag, );
11766 # Stacking of opening and closing tokens
11768 my $token_beg_next = $tokens_to_go[$ibeg_next];
11770 # patch to make something like 'qw(' behave like an opening paren
11772 if ( $types_to_go[$ibeg_next] eq 'q' ) {
11773 if ( $token_beg_next =~ /^qw\s*([\[\(\{])$/ ) {
11774 $token_beg_next = $1;
11778 if ( $is_closing_token{$token_end}
11779 && $is_closing_token{$token_beg_next} )
11781 $stackable = $stack_closing_token{$token_beg_next}
11782 unless ( $block_type_to_go[$ibeg_next] )
11783 ; # shouldn't happen; just checking
11785 elsif ($is_opening_token{$token_end}
11786 && $is_opening_token{$token_beg_next} )
11788 $stackable = $stack_opening_token{$token_beg_next}
11789 unless ( $block_type_to_go[$ibeg_next] )
11790 ; # shouldn't happen; just checking
11795 my $is_semicolon_terminated;
11796 if ( $n + 1 == $n_last_line ) {
11797 my ( $terminal_type, $i_terminal ) = terminal_type(
11798 \@types_to_go, \@block_type_to_go,
11799 $ibeg_next, $iend_next
11801 $is_semicolon_terminated = $terminal_type eq ';'
11802 && $nesting_depth_to_go[$iend_next] <
11803 $nesting_depth_to_go[$ibeg_next];
11806 # this must be a line with just an opening token
11807 # or end in a semicolon
11809 $is_semicolon_terminated
11810 || ( $iend_next == $ibeg_next
11811 || $iend_next == $ibeg_next + 2
11812 && $types_to_go[$iend_next] eq '#' )
11815 my $valid_flag = 1;
11816 my $spaces = ( $types_to_go[ $ibeg_next - 1 ] eq 'b' ) ? 1 : 0;
11817 @{$rvertical_tightness_flags} =
11818 ( 2, $spaces, $type_sequence_to_go[$ibeg_next], $valid_flag,
11824 # Check for a last line with isolated opening BLOCK curly
11825 elsif ($rOpts_block_brace_vertical_tightness
11827 && $types_to_go[$iend] eq '{'
11828 && $block_type_to_go[$iend] =~
11829 /$block_brace_vertical_tightness_pattern/o )
11831 @{$rvertical_tightness_flags} =
11832 ( 3, $rOpts_block_brace_vertical_tightness, 0, 1 );
11835 # pack in the sequence numbers of the ends of this line
11836 $rvertical_tightness_flags->[4] = get_seqno($ibeg);
11837 $rvertical_tightness_flags->[5] = get_seqno($iend);
11838 return $rvertical_tightness_flags;
11843 # get opening and closing sequence numbers of a token for the vertical
11844 # aligner. Assign qw quotes a value to allow qw opening and closing tokens
11845 # to be treated somewhat like opening and closing tokens for stacking
11846 # tokens by the vertical aligner.
11848 my $seqno = $type_sequence_to_go[$ii];
11849 if ( $types_to_go[$ii] eq 'q' ) {
11852 $seqno = $SEQ_QW if ( $tokens_to_go[$ii] =~ /^qw\s*[\(\{\[]/ );
11855 if ( !$ending_in_quote ) {
11856 $seqno = $SEQ_QW if ( $tokens_to_go[$ii] =~ /[\)\}\]]$/ );
11864 my %is_vertical_alignment_type;
11865 my %is_vertical_alignment_keyword;
11870 = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=
11871 { ? : => =~ && || // ~~ !~~
11873 @is_vertical_alignment_type{@_} = (1) x scalar(@_);
11875 @_ = qw(if unless and or err eq ne for foreach while until);
11876 @is_vertical_alignment_keyword{@_} = (1) x scalar(@_);
11879 sub set_vertical_alignment_markers {
11881 # This routine takes the first step toward vertical alignment of the
11882 # lines of output text. It looks for certain tokens which can serve as
11883 # vertical alignment markers (such as an '=').
11885 # Method: We look at each token $i in this output batch and set
11886 # $matching_token_to_go[$i] equal to those tokens at which we would
11887 # accept vertical alignment.
11889 # nothing to do if we aren't allowed to change whitespace
11890 if ( !$rOpts_add_whitespace ) {
11891 for my $i ( 0 .. $max_index_to_go ) {
11892 $matching_token_to_go[$i] = '';
11897 my ( $ri_first, $ri_last ) = @_;
11899 # remember the index of last nonblank token before any sidecomment
11900 my $i_terminal = $max_index_to_go;
11901 if ( $types_to_go[$i_terminal] eq '#' ) {
11902 if ( $i_terminal > 0 && $types_to_go[ --$i_terminal ] eq 'b' ) {
11903 if ( $i_terminal > 0 ) { --$i_terminal }
11907 # look at each line of this batch..
11908 my $last_vertical_alignment_before_index;
11909 my $vert_last_nonblank_type;
11910 my $vert_last_nonblank_token;
11911 my $vert_last_nonblank_block_type;
11912 my $max_line = @$ri_first - 1;
11913 my ( $i, $type, $token, $block_type, $alignment_type );
11914 my ( $ibeg, $iend, $line );
11916 foreach $line ( 0 .. $max_line ) {
11917 $ibeg = $$ri_first[$line];
11918 $iend = $$ri_last[$line];
11919 $last_vertical_alignment_before_index = -1;
11920 $vert_last_nonblank_type = '';
11921 $vert_last_nonblank_token = '';
11922 $vert_last_nonblank_block_type = '';
11924 # look at each token in this output line..
11925 foreach $i ( $ibeg .. $iend ) {
11926 $alignment_type = '';
11927 $type = $types_to_go[$i];
11928 $block_type = $block_type_to_go[$i];
11929 $token = $tokens_to_go[$i];
11931 # check for flag indicating that we should not align
11933 if ( $matching_token_to_go[$i] ) {
11934 $matching_token_to_go[$i] = '';
11938 #--------------------------------------------------------
11939 # First see if we want to align BEFORE this token
11940 #--------------------------------------------------------
11942 # The first possible token that we can align before
11943 # is index 2 because: 1) it doesn't normally make sense to
11944 # align before the first token and 2) the second
11945 # token must be a blank if we are to align before
11947 if ( $i < $ibeg + 2 ) { }
11949 # must follow a blank token
11950 elsif ( $types_to_go[ $i - 1 ] ne 'b' ) { }
11952 # align a side comment --
11953 elsif ( $type eq '#' ) {
11957 # it is a static side comment
11959 $rOpts->{'static-side-comments'}
11960 && $token =~ /$static_side_comment_pattern/o
11963 # or a closing side comment
11964 || ( $vert_last_nonblank_block_type
11966 /$closing_side_comment_prefix_pattern/o )
11969 $alignment_type = $type;
11970 } ## Example of a static side comment
11973 # otherwise, do not align two in a row to create a
11975 elsif ( $last_vertical_alignment_before_index == $i - 2 ) { }
11977 # align before one of these keywords
11978 # (within a line, since $i>1)
11979 elsif ( $type eq 'k' ) {
11981 # /^(if|unless|and|or|eq|ne)$/
11982 if ( $is_vertical_alignment_keyword{$token} ) {
11983 $alignment_type = $token;
11987 # align before one of these types..
11988 # Note: add '.' after new vertical aligner is operational
11989 elsif ( $is_vertical_alignment_type{$type} ) {
11990 $alignment_type = $token;
11992 # Do not align a terminal token. Although it might
11993 # occasionally look ok to do this, it has been found to be
11994 # a good general rule. The main problems are:
11995 # (1) that the terminal token (such as an = or :) might get
11996 # moved far to the right where it is hard to see because
11997 # nothing follows it, and
11998 # (2) doing so may prevent other good alignments.
11999 if ( $i == $iend || $i >= $i_terminal ) {
12000 $alignment_type = "";
12003 # Do not align leading ': (' or '. ('. This would prevent
12004 # alignment in something like the following:
12006 # ( $input_line_number < 10 ) ? " "
12007 # : ( $input_line_number < 100 ) ? " "
12011 # ( $case_matters ? $accessor : " lc($accessor) " )
12012 # . ( $yesno ? " eq " : " ne " )
12013 if ( $i == $ibeg + 2
12014 && $types_to_go[$ibeg] =~ /^[\.\:]$/
12015 && $types_to_go[ $i - 1 ] eq 'b' )
12017 $alignment_type = "";
12020 # For a paren after keyword, only align something like this:
12022 # elsif ( $b ) { &b }
12023 if ( $token eq '(' && $vert_last_nonblank_type eq 'k' ) {
12024 $alignment_type = ""
12025 unless $vert_last_nonblank_token =~
12026 /^(if|unless|elsif)$/;
12029 # be sure the alignment tokens are unique
12030 # This didn't work well: reason not determined
12031 # if ($token ne $type) {$alignment_type .= $type}
12034 # NOTE: This is deactivated because it causes the previous
12035 # if/elsif alignment to fail
12036 #elsif ( $type eq '}' && $token eq '}' && $block_type_to_go[$i])
12037 #{ $alignment_type = $type; }
12039 if ($alignment_type) {
12040 $last_vertical_alignment_before_index = $i;
12043 #--------------------------------------------------------
12044 # Next see if we want to align AFTER the previous nonblank
12045 #--------------------------------------------------------
12047 # We want to line up ',' and interior ';' tokens, with the added
12048 # space AFTER these tokens. (Note: interior ';' is included
12049 # because it may occur in short blocks).
12052 # we haven't already set it
12055 # and its not the first token of the line
12058 # and it follows a blank
12059 && $types_to_go[ $i - 1 ] eq 'b'
12061 # and previous token IS one of these:
12062 && ( $vert_last_nonblank_type =~ /^[\,\;]$/ )
12064 # and it's NOT one of these
12065 && ( $type !~ /^[b\#\)\]\}]$/ )
12067 # then go ahead and align
12071 $alignment_type = $vert_last_nonblank_type;
12074 #--------------------------------------------------------
12075 # then store the value
12076 #--------------------------------------------------------
12077 $matching_token_to_go[$i] = $alignment_type;
12078 if ( $type ne 'b' ) {
12079 $vert_last_nonblank_type = $type;
12080 $vert_last_nonblank_token = $token;
12081 $vert_last_nonblank_block_type = $block_type;
12088 sub terminal_type {
12090 # returns type of last token on this line (terminal token), as follows:
12091 # returns # for a full-line comment
12092 # returns ' ' for a blank line
12093 # otherwise returns final token type
12095 my ( $rtype, $rblock_type, $ibeg, $iend ) = @_;
12097 # check for full-line comment..
12098 if ( $$rtype[$ibeg] eq '#' ) {
12099 return wantarray ? ( $$rtype[$ibeg], $ibeg ) : $$rtype[$ibeg];
12103 # start at end and walk bakwards..
12104 for ( my $i = $iend ; $i >= $ibeg ; $i-- ) {
12106 # skip past any side comment and blanks
12107 next if ( $$rtype[$i] eq 'b' );
12108 next if ( $$rtype[$i] eq '#' );
12110 # found it..make sure it is a BLOCK termination,
12111 # but hide a terminal } after sort/grep/map because it is not
12112 # necessarily the end of the line. (terminal.t)
12113 my $terminal_type = $$rtype[$i];
12115 $terminal_type eq '}'
12116 && ( !$$rblock_type[$i]
12117 || ( $is_sort_map_grep_eval_do{ $$rblock_type[$i] } ) )
12120 $terminal_type = 'b';
12122 return wantarray ? ( $terminal_type, $i ) : $terminal_type;
12126 return wantarray ? ( ' ', $ibeg ) : ' ';
12131 my %is_good_keyword_breakpoint;
12132 my %is_lt_gt_le_ge;
12134 sub set_bond_strengths {
12138 @_ = qw(if unless while until for foreach);
12139 @is_good_keyword_breakpoint{@_} = (1) x scalar(@_);
12141 @_ = qw(lt gt le ge);
12142 @is_lt_gt_le_ge{@_} = (1) x scalar(@_);
12144 ###############################################################
12145 # NOTE: NO_BREAK's set here are HINTS which may not be honored;
12146 # essential NO_BREAKS's must be enforced in section 2, below.
12147 ###############################################################
12149 # adding NEW_TOKENS: add a left and right bond strength by
12150 # mimmicking what is done for an existing token type. You
12151 # can skip this step at first and take the default, then
12152 # tweak later to get desired results.
12154 # The bond strengths should roughly follow precenence order where
12155 # possible. If you make changes, please check the results very
12156 # carefully on a variety of scripts.
12158 # no break around possible filehandle
12159 $left_bond_strength{'Z'} = NO_BREAK;
12160 $right_bond_strength{'Z'} = NO_BREAK;
12162 # never put a bare word on a new line:
12163 # example print (STDERR, "bla"); will fail with break after (
12164 $left_bond_strength{'w'} = NO_BREAK;
12166 # blanks always have infinite strength to force breaks after real tokens
12167 $right_bond_strength{'b'} = NO_BREAK;
12169 # try not to break on exponentation
12170 @_ = qw" ** .. ... <=> ";
12171 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12172 @right_bond_strength{@_} = (STRONG) x scalar(@_);
12174 # The comma-arrow has very low precedence but not a good break point
12175 $left_bond_strength{'=>'} = NO_BREAK;
12176 $right_bond_strength{'=>'} = NOMINAL;
12178 # ok to break after label
12179 $left_bond_strength{'J'} = NO_BREAK;
12180 $right_bond_strength{'J'} = NOMINAL;
12181 $left_bond_strength{'j'} = STRONG;
12182 $right_bond_strength{'j'} = STRONG;
12183 $left_bond_strength{'A'} = STRONG;
12184 $right_bond_strength{'A'} = STRONG;
12186 $left_bond_strength{'->'} = STRONG;
12187 $right_bond_strength{'->'} = VERY_STRONG;
12189 # breaking AFTER modulus operator is ok:
12191 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12192 @right_bond_strength{@_} =
12193 ( 0.1 * NOMINAL + 0.9 * STRONG ) x scalar(@_);
12195 # Break AFTER math operators * and /
12197 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12198 @right_bond_strength{@_} = (NOMINAL) x scalar(@_);
12200 # Break AFTER weakest math operators + and -
12201 # Make them weaker than * but a bit stronger than '.'
12203 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12204 @right_bond_strength{@_} =
12205 ( 0.91 * NOMINAL + 0.09 * WEAK ) x scalar(@_);
12207 # breaking BEFORE these is just ok:
12209 @right_bond_strength{@_} = (STRONG) x scalar(@_);
12210 @left_bond_strength{@_} = (NOMINAL) x scalar(@_);
12212 # breaking before the string concatenation operator seems best
12213 # because it can be hard to see at the end of a line
12214 $right_bond_strength{'.'} = STRONG;
12215 $left_bond_strength{'.'} = 0.9 * NOMINAL + 0.1 * WEAK;
12218 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12219 @right_bond_strength{@_} = (NOMINAL) x scalar(@_);
12221 # make these a little weaker than nominal so that they get
12222 # favored for end-of-line characters
12223 @_ = qw"!= == =~ !~ ~~ !~~";
12224 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12225 @right_bond_strength{@_} =
12226 ( 0.9 * NOMINAL + 0.1 * WEAK ) x scalar(@_);
12228 # break AFTER these
12229 @_ = qw" < > | & >= <=";
12230 @left_bond_strength{@_} = (VERY_STRONG) x scalar(@_);
12231 @right_bond_strength{@_} =
12232 ( 0.8 * NOMINAL + 0.2 * WEAK ) x scalar(@_);
12234 # breaking either before or after a quote is ok
12235 # but bias for breaking before a quote
12236 $left_bond_strength{'Q'} = NOMINAL;
12237 $right_bond_strength{'Q'} = NOMINAL + 0.02;
12238 $left_bond_strength{'q'} = NOMINAL;
12239 $right_bond_strength{'q'} = NOMINAL;
12241 # starting a line with a keyword is usually ok
12242 $left_bond_strength{'k'} = NOMINAL;
12244 # we usually want to bond a keyword strongly to what immediately
12245 # follows, rather than leaving it stranded at the end of a line
12246 $right_bond_strength{'k'} = STRONG;
12248 $left_bond_strength{'G'} = NOMINAL;
12249 $right_bond_strength{'G'} = STRONG;
12251 # it is good to break AFTER various assignment operators
12253 = **= += *= &= <<= &&=
12254 -= /= |= >>= ||= //=
12258 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12259 @right_bond_strength{@_} =
12260 ( 0.4 * WEAK + 0.6 * VERY_WEAK ) x scalar(@_);
12262 # break BEFORE '&&' and '||' and '//'
12263 # set strength of '||' to same as '=' so that chains like
12264 # $a = $b || $c || $d will break before the first '||'
12265 $right_bond_strength{'||'} = NOMINAL;
12266 $left_bond_strength{'||'} = $right_bond_strength{'='};
12268 # same thing for '//'
12269 $right_bond_strength{'//'} = NOMINAL;
12270 $left_bond_strength{'//'} = $right_bond_strength{'='};
12272 # set strength of && a little higher than ||
12273 $right_bond_strength{'&&'} = NOMINAL;
12274 $left_bond_strength{'&&'} = $left_bond_strength{'||'} + 0.1;
12276 $left_bond_strength{';'} = VERY_STRONG;
12277 $right_bond_strength{';'} = VERY_WEAK;
12278 $left_bond_strength{'f'} = VERY_STRONG;
12280 # make right strength of for ';' a little less than '='
12281 # to make for contents break after the ';' to avoid this:
12282 # for ( $j = $number_of_fields - 1 ; $j < $item_count ; $j +=
12283 # $number_of_fields )
12284 # and make it weaker than ',' and 'and' too
12285 $right_bond_strength{'f'} = VERY_WEAK - 0.03;
12287 # The strengths of ?/: should be somewhere between
12288 # an '=' and a quote (NOMINAL),
12289 # make strength of ':' slightly less than '?' to help
12290 # break long chains of ? : after the colons
12291 $left_bond_strength{':'} = 0.4 * WEAK + 0.6 * NOMINAL;
12292 $right_bond_strength{':'} = NO_BREAK;
12293 $left_bond_strength{'?'} = $left_bond_strength{':'} + 0.01;
12294 $right_bond_strength{'?'} = NO_BREAK;
12296 $left_bond_strength{','} = VERY_STRONG;
12297 $right_bond_strength{','} = VERY_WEAK;
12299 # Set bond strengths of certain keywords
12300 # make 'or', 'err', 'and' slightly weaker than a ','
12301 $left_bond_strength{'and'} = VERY_WEAK - 0.01;
12302 $left_bond_strength{'or'} = VERY_WEAK - 0.02;
12303 $left_bond_strength{'err'} = VERY_WEAK - 0.02;
12304 $left_bond_strength{'xor'} = NOMINAL;
12305 $right_bond_strength{'and'} = NOMINAL;
12306 $right_bond_strength{'or'} = NOMINAL;
12307 $right_bond_strength{'err'} = NOMINAL;
12308 $right_bond_strength{'xor'} = STRONG;
12311 # patch-its always ok to break at end of line
12312 $nobreak_to_go[$max_index_to_go] = 0;
12314 # adding a small 'bias' to strengths is a simple way to make a line
12315 # break at the first of a sequence of identical terms. For example,
12316 # to force long string of conditional operators to break with
12317 # each line ending in a ':', we can add a small number to the bond
12318 # strength of each ':'
12319 my $colon_bias = 0;
12326 my $code_bias = -.01;
12330 my $last_nonblank_type = $type;
12331 my $last_nonblank_token = $token;
12332 my $delta_bias = 0.0001;
12333 my $list_str = $left_bond_strength{'?'};
12335 my ( $block_type, $i_next, $i_next_nonblank, $next_nonblank_token,
12336 $next_nonblank_type, $next_token, $next_type, $total_nesting_depth,
12339 # preliminary loop to compute bond strengths
12340 for ( my $i = 0 ; $i <= $max_index_to_go ; $i++ ) {
12341 $last_type = $type;
12342 if ( $type ne 'b' ) {
12343 $last_nonblank_type = $type;
12344 $last_nonblank_token = $token;
12346 $type = $types_to_go[$i];
12348 # strength on both sides of a blank is the same
12349 if ( $type eq 'b' && $last_type ne 'b' ) {
12350 $bond_strength_to_go[$i] = $bond_strength_to_go[ $i - 1 ];
12354 $token = $tokens_to_go[$i];
12355 $block_type = $block_type_to_go[$i];
12357 $next_type = $types_to_go[$i_next];
12358 $next_token = $tokens_to_go[$i_next];
12359 $total_nesting_depth = $nesting_depth_to_go[$i_next];
12360 $i_next_nonblank = ( ( $next_type eq 'b' ) ? $i + 2 : $i + 1 );
12361 $next_nonblank_type = $types_to_go[$i_next_nonblank];
12362 $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
12364 # Some token chemistry... The decision about where to break a
12365 # line depends upon a "bond strength" between tokens. The LOWER
12366 # the bond strength, the MORE likely a break. The strength
12367 # values are based on trial-and-error, and need to be tweaked
12368 # occasionally to get desired results. Things to keep in mind
12370 # 1. relative strengths are important. small differences
12371 # in strengths can make big formatting differences.
12372 # 2. each indentation level adds one unit of bond strength
12373 # 3. a value of NO_BREAK makes an unbreakable bond
12374 # 4. a value of VERY_WEAK is the strength of a ','
12375 # 5. values below NOMINAL are considered ok break points
12376 # 6. values above NOMINAL are considered poor break points
12377 # We are computing the strength of the bond between the current
12378 # token and the NEXT token.
12379 my $bond_str = VERY_STRONG; # a default, high strength
12381 #---------------------------------------------------------------
12383 # use minimum of left and right bond strengths if defined;
12384 # digraphs and trigraphs like to break on their left
12385 #---------------------------------------------------------------
12386 my $bsr = $right_bond_strength{$type};
12388 if ( !defined($bsr) ) {
12390 if ( $is_digraph{$type} || $is_trigraph{$type} ) {
12394 $bsr = VERY_STRONG;
12398 # define right bond strengths of certain keywords
12399 if ( $type eq 'k' && defined( $right_bond_strength{$token} ) ) {
12400 $bsr = $right_bond_strength{$token};
12402 elsif ( $token eq 'ne' or $token eq 'eq' ) {
12405 my $bsl = $left_bond_strength{$next_nonblank_type};
12407 # set terminal bond strength to the nominal value
12408 # this will cause good preceding breaks to be retained
12409 if ( $i_next_nonblank > $max_index_to_go ) {
12413 if ( !defined($bsl) ) {
12415 if ( $is_digraph{$next_nonblank_type}
12416 || $is_trigraph{$next_nonblank_type} )
12421 $bsl = VERY_STRONG;
12425 # define right bond strengths of certain keywords
12426 if ( $next_nonblank_type eq 'k'
12427 && defined( $left_bond_strength{$next_nonblank_token} ) )
12429 $bsl = $left_bond_strength{$next_nonblank_token};
12431 elsif ($next_nonblank_token eq 'ne'
12432 or $next_nonblank_token eq 'eq' )
12436 elsif ( $is_lt_gt_le_ge{$next_nonblank_token} ) {
12437 $bsl = 0.9 * NOMINAL + 0.1 * STRONG;
12440 # Note: it might seem that we would want to keep a NO_BREAK if
12441 # either token has this value. This didn't work, because in an
12442 # arrow list, it prevents the comma from separating from the
12443 # following bare word (which is probably quoted by its arrow).
12444 # So necessary NO_BREAK's have to be handled as special cases
12445 # in the final section.
12446 $bond_str = ( $bsr < $bsl ) ? $bsr : $bsl;
12447 my $bond_str_1 = $bond_str;
12449 #---------------------------------------------------------------
12452 #---------------------------------------------------------------
12454 # allow long lines before final { in an if statement, as in:
12459 # Otherwise, the line before the { tends to be too short.
12460 if ( $type eq ')' ) {
12461 if ( $next_nonblank_type eq '{' ) {
12462 $bond_str = VERY_WEAK + 0.03;
12466 elsif ( $type eq '(' ) {
12467 if ( $next_nonblank_type eq '{' ) {
12468 $bond_str = NOMINAL;
12472 # break on something like '} (', but keep this stronger than a ','
12473 # example is in 'howe.pl'
12474 elsif ( $type eq 'R' or $type eq '}' ) {
12475 if ( $next_nonblank_type eq '(' ) {
12476 $bond_str = 0.8 * VERY_WEAK + 0.2 * WEAK;
12480 #-----------------------------------------------------------------
12481 # adjust bond strength bias
12482 #-----------------------------------------------------------------
12484 elsif ( $type eq 'f' ) {
12485 $bond_str += $f_bias;
12486 $f_bias += $delta_bias;
12489 # in long ?: conditionals, bias toward just one set per line (colon.t)
12490 elsif ( $type eq ':' ) {
12491 if ( !$want_break_before{$type} ) {
12492 $bond_str += $colon_bias;
12493 $colon_bias += $delta_bias;
12497 if ( $next_nonblank_type eq ':'
12498 && $want_break_before{$next_nonblank_type} )
12500 $bond_str += $colon_bias;
12501 $colon_bias += $delta_bias;
12504 # if leading '.' is used, align all but 'short' quotes;
12505 # the idea is to not place something like "\n" on a single line.
12506 elsif ( $next_nonblank_type eq '.' ) {
12507 if ( $want_break_before{'.'} ) {
12509 $last_nonblank_type eq '.'
12512 $rOpts_short_concatenation_item_length )
12513 && ( $token !~ /^[\)\]\}]$/ )
12516 $dot_bias += $delta_bias;
12518 $bond_str += $dot_bias;
12521 elsif ($next_nonblank_type eq '&&'
12522 && $want_break_before{$next_nonblank_type} )
12524 $bond_str += $amp_bias;
12525 $amp_bias += $delta_bias;
12527 elsif ($next_nonblank_type eq '||'
12528 && $want_break_before{$next_nonblank_type} )
12530 $bond_str += $bar_bias;
12531 $bar_bias += $delta_bias;
12533 elsif ( $next_nonblank_type eq 'k' ) {
12535 if ( $next_nonblank_token eq 'and'
12536 && $want_break_before{$next_nonblank_token} )
12538 $bond_str += $and_bias;
12539 $and_bias += $delta_bias;
12541 elsif ($next_nonblank_token =~ /^(or|err)$/
12542 && $want_break_before{$next_nonblank_token} )
12544 $bond_str += $or_bias;
12545 $or_bias += $delta_bias;
12548 # FIXME: needs more testing
12549 elsif ( $is_keyword_returning_list{$next_nonblank_token} ) {
12550 $bond_str = $list_str if ( $bond_str > $list_str );
12552 elsif ( $token eq 'err'
12553 && !$want_break_before{$token} )
12555 $bond_str += $or_bias;
12556 $or_bias += $delta_bias;
12561 && !$want_break_before{$type} )
12563 $bond_str += $colon_bias;
12564 $colon_bias += $delta_bias;
12566 elsif ( $type eq '&&'
12567 && !$want_break_before{$type} )
12569 $bond_str += $amp_bias;
12570 $amp_bias += $delta_bias;
12572 elsif ( $type eq '||'
12573 && !$want_break_before{$type} )
12575 $bond_str += $bar_bias;
12576 $bar_bias += $delta_bias;
12578 elsif ( $type eq 'k' ) {
12580 if ( $token eq 'and'
12581 && !$want_break_before{$token} )
12583 $bond_str += $and_bias;
12584 $and_bias += $delta_bias;
12586 elsif ( $token eq 'or'
12587 && !$want_break_before{$token} )
12589 $bond_str += $or_bias;
12590 $or_bias += $delta_bias;
12594 # keep matrix and hash indices together
12595 # but make them a little below STRONG to allow breaking open
12596 # something like {'some-word'}{'some-very-long-word'} at the }{
12598 if ( ( $type eq ']' or $type eq 'R' )
12599 && ( $next_nonblank_type eq '[' or $next_nonblank_type eq 'L' )
12602 $bond_str = 0.9 * STRONG + 0.1 * NOMINAL;
12605 if ( $next_nonblank_token =~ /^->/ ) {
12607 # increase strength to the point where a break in the following
12608 # will be after the opening paren rather than at the arrow:
12610 if ( $type eq 'i' ) {
12611 $bond_str = 1.45 * STRONG;
12614 elsif ( $type =~ /^[\)\]\}R]$/ ) {
12615 $bond_str = 0.1 * STRONG + 0.9 * NOMINAL;
12618 # otherwise make strength before an '->' a little over a '+'
12620 if ( $bond_str <= NOMINAL ) {
12621 $bond_str = NOMINAL + 0.01;
12626 if ( $token eq ')' && $next_nonblank_token eq '[' ) {
12627 $bond_str = 0.2 * STRONG + 0.8 * NOMINAL;
12630 # map1.t -- correct for a quirk in perl
12632 && $next_nonblank_type eq 'i'
12633 && $last_nonblank_type eq 'k'
12634 && $is_sort_map_grep{$last_nonblank_token} )
12636 # /^(sort|map|grep)$/ )
12638 $bond_str = NO_BREAK;
12641 # extrude.t: do not break before paren at:
12643 if ( $last_nonblank_type eq 'F' && $next_nonblank_token eq '(' ) {
12644 $bond_str = NO_BREAK;
12647 # good to break after end of code blocks
12648 if ( $type eq '}' && $block_type ) {
12650 $bond_str = 0.5 * WEAK + 0.5 * VERY_WEAK + $code_bias;
12651 $code_bias += $delta_bias;
12654 if ( $type eq 'k' ) {
12656 # allow certain control keywords to stand out
12657 if ( $next_nonblank_type eq 'k'
12658 && $is_last_next_redo_return{$token} )
12660 $bond_str = 0.45 * WEAK + 0.55 * VERY_WEAK;
12663 # Don't break after keyword my. This is a quick fix for a
12664 # rare problem with perl. An example is this line from file
12666 # foreach my $question( Debian::DebConf::ConfigDb::gettree( $this->{'question'} ) )
12668 if ( $token eq 'my' ) {
12669 $bond_str = NO_BREAK;
12674 # good to break before 'if', 'unless', etc
12675 if ( $is_if_brace_follower{$next_nonblank_token} ) {
12676 $bond_str = VERY_WEAK;
12679 if ( $next_nonblank_type eq 'k' ) {
12681 # keywords like 'unless', 'if', etc, within statements
12683 if ( $is_good_keyword_breakpoint{$next_nonblank_token} ) {
12684 $bond_str = VERY_WEAK / 1.05;
12688 # try not to break before a comma-arrow
12689 elsif ( $next_nonblank_type eq '=>' ) {
12690 if ( $bond_str < STRONG ) { $bond_str = STRONG }
12693 #----------------------------------------------------------------------
12694 # only set NO_BREAK's from here on
12695 #----------------------------------------------------------------------
12696 if ( $type eq 'C' or $type eq 'U' ) {
12698 # use strict requires that bare word and => not be separated
12699 if ( $next_nonblank_type eq '=>' ) {
12700 $bond_str = NO_BREAK;
12705 # use strict requires that bare word within braces not start new line
12706 elsif ( $type eq 'L' ) {
12708 if ( $next_nonblank_type eq 'w' ) {
12709 $bond_str = NO_BREAK;
12713 # in older version of perl, use strict can cause problems with
12714 # breaks before bare words following opening parens. For example,
12715 # this will fail under older versions if a break is made between
12718 # open( MAIL, "a long filename or command");
12720 elsif ( $type eq '{' ) {
12722 if ( $token eq '(' && $next_nonblank_type eq 'w' ) {
12724 # but it's fine to break if the word is followed by a '=>'
12725 # or if it is obviously a sub call
12726 my $i_next_next_nonblank = $i_next_nonblank + 1;
12727 my $next_next_type = $types_to_go[$i_next_next_nonblank];
12728 if ( $next_next_type eq 'b'
12729 && $i_next_nonblank < $max_index_to_go )
12731 $i_next_next_nonblank++;
12732 $next_next_type = $types_to_go[$i_next_next_nonblank];
12735 ##if ( $next_next_type ne '=>' ) {
12736 # these are ok: '->xxx', '=>', '('
12738 # We'll check for an old breakpoint and keep a leading
12739 # bareword if it was that way in the input file.
12740 # Presumably it was ok that way. For example, the
12741 # following would remain unchanged:
12744 # January, February, March, April,
12745 # May, June, July, August,
12746 # September, October, November, December,
12749 # This should be sufficient:
12750 if ( !$old_breakpoint_to_go[$i]
12751 && ( $next_next_type eq ',' || $next_next_type eq '}' )
12754 $bond_str = NO_BREAK;
12759 elsif ( $type eq 'w' ) {
12761 if ( $next_nonblank_type eq 'R' ) {
12762 $bond_str = NO_BREAK;
12765 # use strict requires that bare word and => not be separated
12766 if ( $next_nonblank_type eq '=>' ) {
12767 $bond_str = NO_BREAK;
12771 # in fact, use strict hates bare words on any new line. For
12772 # example, a break before the underscore here provokes the
12773 # wrath of use strict:
12774 # if ( -r $fn && ( -s _ || $AllowZeroFilesize)) {
12775 elsif ( $type eq 'F' ) {
12776 $bond_str = NO_BREAK;
12779 # use strict does not allow separating type info from trailing { }
12780 # testfile is readmail.pl
12781 elsif ( $type eq 't' or $type eq 'i' ) {
12783 if ( $next_nonblank_type eq 'L' ) {
12784 $bond_str = NO_BREAK;
12788 # Do not break between a possible filehandle and a ? or / and do
12789 # not introduce a break after it if there is no blank
12791 elsif ( $type eq 'Z' ) {
12796 # if there is no blank and we do not want one. Examples:
12797 # print $x++ # do not break after $x
12798 # print HTML"HELLO" # break ok after HTML
12801 && defined( $want_left_space{$next_type} )
12802 && $want_left_space{$next_type} == WS_NO
12805 # or we might be followed by the start of a quote
12806 || $next_nonblank_type =~ /^[\/\?]$/
12809 $bond_str = NO_BREAK;
12813 # Do not break before a possible file handle
12814 if ( $next_nonblank_type eq 'Z' ) {
12815 $bond_str = NO_BREAK;
12818 # As a defensive measure, do not break between a '(' and a
12819 # filehandle. In some cases, this can cause an error. For
12820 # example, the following program works:
12827 # But this program fails:
12835 # This is normally only a problem with the 'extrude' option
12836 if ( $next_nonblank_type eq 'Y' && $token eq '(' ) {
12837 $bond_str = NO_BREAK;
12840 # patch to put cuddled elses back together when on multiple
12841 # lines, as in: } \n else \n { \n
12842 if ($rOpts_cuddled_else) {
12844 if ( ( $token eq 'else' ) && ( $next_nonblank_type eq '{' )
12845 || ( $type eq '}' ) && ( $next_nonblank_token eq 'else' ) )
12847 $bond_str = NO_BREAK;
12851 # keep '}' together with ';'
12852 if ( ( $token eq '}' ) && ( $next_nonblank_type eq ';' ) ) {
12853 $bond_str = NO_BREAK;
12856 # never break between sub name and opening paren
12857 if ( ( $type eq 'w' ) && ( $next_nonblank_token eq '(' ) ) {
12858 $bond_str = NO_BREAK;
12861 #---------------------------------------------------------------
12863 # now take nesting depth into account
12864 #---------------------------------------------------------------
12865 # final strength incorporates the bond strength and nesting depth
12868 if ( defined($bond_str) && !$nobreak_to_go[$i] ) {
12869 if ( $total_nesting_depth > 0 ) {
12870 $strength = $bond_str + $total_nesting_depth;
12873 $strength = $bond_str;
12877 $strength = NO_BREAK;
12880 # always break after side comment
12881 if ( $type eq '#' ) { $strength = 0 }
12883 $bond_strength_to_go[$i] = $strength;
12885 FORMATTER_DEBUG_FLAG_BOND && do {
12886 my $str = substr( $token, 0, 15 );
12887 $str .= ' ' x ( 16 - length($str) );
12889 "BOND: i=$i $str $type $next_nonblank_type depth=$total_nesting_depth strength=$bond_str_1 -> $bond_str -> $strength \n";
12896 sub pad_array_to_go {
12898 # to simplify coding in scan_list and set_bond_strengths, it helps
12899 # to create some extra blank tokens at the end of the arrays
12900 $tokens_to_go[ $max_index_to_go + 1 ] = '';
12901 $tokens_to_go[ $max_index_to_go + 2 ] = '';
12902 $types_to_go[ $max_index_to_go + 1 ] = 'b';
12903 $types_to_go[ $max_index_to_go + 2 ] = 'b';
12904 $nesting_depth_to_go[ $max_index_to_go + 1 ] =
12905 $nesting_depth_to_go[$max_index_to_go];
12908 if ( $is_closing_type{ $types_to_go[$max_index_to_go] } ) {
12909 if ( $nesting_depth_to_go[$max_index_to_go] <= 0 ) {
12911 # shouldn't happen:
12912 unless ( get_saw_brace_error() ) {
12914 "Program bug in scan_list: hit nesting error which should have been caught\n"
12916 report_definite_bug();
12920 $nesting_depth_to_go[ $max_index_to_go + 1 ] -= 1;
12925 elsif ( $is_opening_type{ $types_to_go[$max_index_to_go] } ) {
12926 $nesting_depth_to_go[ $max_index_to_go + 1 ] += 1;
12930 { # begin scan_list
12933 $block_type, $current_depth,
12935 $i_last_nonblank_token, $last_colon_sequence_number,
12936 $last_nonblank_token, $last_nonblank_type,
12937 $last_old_breakpoint_count, $minimum_depth,
12938 $next_nonblank_block_type, $next_nonblank_token,
12939 $next_nonblank_type, $old_breakpoint_count,
12940 $starting_breakpoint_count, $starting_depth,
12946 @breakpoint_stack, @breakpoint_undo_stack,
12947 @comma_index, @container_type,
12948 @identifier_count_stack, @index_before_arrow,
12949 @interrupted_list, @item_count_stack,
12950 @last_comma_index, @last_dot_index,
12951 @last_nonblank_type, @old_breakpoint_count_stack,
12952 @opening_structure_index_stack, @rfor_semicolon_list,
12953 @has_old_logical_breakpoints, @rand_or_list,
12957 # routine to define essential variables when we go 'up' to
12959 sub check_for_new_minimum_depth {
12961 if ( $depth < $minimum_depth ) {
12963 $minimum_depth = $depth;
12965 # these arrays need not retain values between calls
12966 $breakpoint_stack[$depth] = $starting_breakpoint_count;
12967 $container_type[$depth] = "";
12968 $identifier_count_stack[$depth] = 0;
12969 $index_before_arrow[$depth] = -1;
12970 $interrupted_list[$depth] = 1;
12971 $item_count_stack[$depth] = 0;
12972 $last_nonblank_type[$depth] = "";
12973 $opening_structure_index_stack[$depth] = -1;
12975 $breakpoint_undo_stack[$depth] = undef;
12976 $comma_index[$depth] = undef;
12977 $last_comma_index[$depth] = undef;
12978 $last_dot_index[$depth] = undef;
12979 $old_breakpoint_count_stack[$depth] = undef;
12980 $has_old_logical_breakpoints[$depth] = 0;
12981 $rand_or_list[$depth] = [];
12982 $rfor_semicolon_list[$depth] = [];
12983 $i_equals[$depth] = -1;
12985 # these arrays must retain values between calls
12986 if ( !defined( $has_broken_sublist[$depth] ) ) {
12987 $dont_align[$depth] = 0;
12988 $has_broken_sublist[$depth] = 0;
12989 $want_comma_break[$depth] = 0;
12994 # routine to decide which commas to break at within a container;
12996 # $bp_count = number of comma breakpoints set
12997 # $do_not_break_apart = a flag indicating if container need not
12999 sub set_comma_breakpoints {
13003 my $do_not_break_apart = 0;
13004 if ( $item_count_stack[$dd] && !$dont_align[$dd] ) {
13006 my $fbc = $forced_breakpoint_count;
13008 # always open comma lists not preceded by keywords,
13009 # barewords, identifiers (that is, anything that doesn't
13010 # look like a function call)
13011 my $must_break_open = $last_nonblank_type[$dd] !~ /^[kwiU]$/;
13013 set_comma_breakpoints_do(
13015 $opening_structure_index_stack[$dd],
13017 $item_count_stack[$dd],
13018 $identifier_count_stack[$dd],
13020 $next_nonblank_type,
13021 $container_type[$dd],
13022 $interrupted_list[$dd],
13023 \$do_not_break_apart,
13026 $bp_count = $forced_breakpoint_count - $fbc;
13027 $do_not_break_apart = 0 if $must_break_open;
13029 return ( $bp_count, $do_not_break_apart );
13032 my %is_logical_container;
13035 @_ = qw# if elsif unless while and or err not && | || ? : ! #;
13036 @is_logical_container{@_} = (1) x scalar(@_);
13039 sub set_for_semicolon_breakpoints {
13041 foreach ( @{ $rfor_semicolon_list[$dd] } ) {
13042 set_forced_breakpoint($_);
13046 sub set_logical_breakpoints {
13049 $item_count_stack[$dd] == 0
13050 && $is_logical_container{ $container_type[$dd] }
13053 || $has_old_logical_breakpoints[$dd]
13057 # Look for breaks in this order:
13060 foreach my $i ( 0 .. 3 ) {
13061 if ( $rand_or_list[$dd][$i] ) {
13062 foreach ( @{ $rand_or_list[$dd][$i] } ) {
13063 set_forced_breakpoint($_);
13066 # break at any 'if' and 'unless' too
13067 foreach ( @{ $rand_or_list[$dd][4] } ) {
13068 set_forced_breakpoint($_);
13070 $rand_or_list[$dd] = [];
13077 sub is_unbreakable_container {
13079 # never break a container of one of these types
13080 # because bad things can happen (map1.t)
13082 $is_sort_map_grep{ $container_type[$dd] };
13087 # This routine is responsible for setting line breaks for all lists,
13088 # so that hierarchical structure can be displayed and so that list
13089 # items can be vertically aligned. The output of this routine is
13090 # stored in the array @forced_breakpoint_to_go, which is used to set
13091 # final breakpoints.
13093 $starting_depth = $nesting_depth_to_go[0];
13096 $current_depth = $starting_depth;
13098 $last_colon_sequence_number = -1;
13099 $last_nonblank_token = ';';
13100 $last_nonblank_type = ';';
13101 $last_nonblank_block_type = ' ';
13102 $last_old_breakpoint_count = 0;
13103 $minimum_depth = $current_depth + 1; # forces update in check below
13104 $old_breakpoint_count = 0;
13105 $starting_breakpoint_count = $forced_breakpoint_count;
13108 $type_sequence = '';
13110 check_for_new_minimum_depth($current_depth);
13112 my $is_long_line = excess_line_length( 0, $max_index_to_go ) > 0;
13113 my $want_previous_breakpoint = -1;
13115 my $saw_good_breakpoint;
13116 my $i_line_end = -1;
13117 my $i_line_start = -1;
13119 # loop over all tokens in this batch
13120 while ( ++$i <= $max_index_to_go ) {
13121 if ( $type ne 'b' ) {
13122 $i_last_nonblank_token = $i - 1;
13123 $last_nonblank_type = $type;
13124 $last_nonblank_token = $token;
13125 $last_nonblank_block_type = $block_type;
13127 $type = $types_to_go[$i];
13128 $block_type = $block_type_to_go[$i];
13129 $token = $tokens_to_go[$i];
13130 $type_sequence = $type_sequence_to_go[$i];
13131 my $next_type = $types_to_go[ $i + 1 ];
13132 my $next_token = $tokens_to_go[ $i + 1 ];
13133 my $i_next_nonblank = ( ( $next_type eq 'b' ) ? $i + 2 : $i + 1 );
13134 $next_nonblank_type = $types_to_go[$i_next_nonblank];
13135 $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
13136 $next_nonblank_block_type = $block_type_to_go[$i_next_nonblank];
13138 # set break if flag was set
13139 if ( $want_previous_breakpoint >= 0 ) {
13140 set_forced_breakpoint($want_previous_breakpoint);
13141 $want_previous_breakpoint = -1;
13144 $last_old_breakpoint_count = $old_breakpoint_count;
13145 if ( $old_breakpoint_to_go[$i] ) {
13147 $i_line_start = $i_next_nonblank;
13149 $old_breakpoint_count++;
13151 # Break before certain keywords if user broke there and
13152 # this is a 'safe' break point. The idea is to retain
13153 # any preferred breaks for sequential list operations,
13154 # like a schwartzian transform.
13155 if ($rOpts_break_at_old_keyword_breakpoints) {
13157 $next_nonblank_type eq 'k'
13158 && $is_keyword_returning_list{$next_nonblank_token}
13159 && ( $type =~ /^[=\)\]\}Riw]$/
13161 && $is_keyword_returning_list{$token} )
13165 # we actually have to set this break next time through
13166 # the loop because if we are at a closing token (such
13167 # as '}') which forms a one-line block, this break might
13169 $want_previous_breakpoint = $i;
13173 next if ( $type eq 'b' );
13174 $depth = $nesting_depth_to_go[ $i + 1 ];
13176 # safety check - be sure we always break after a comment
13177 # Shouldn't happen .. an error here probably means that the
13178 # nobreak flag did not get turned off correctly during
13180 if ( $type eq '#' ) {
13181 if ( $i != $max_index_to_go ) {
13183 "Non-fatal program bug: backup logic needed to break after a comment\n"
13185 report_definite_bug();
13186 $nobreak_to_go[$i] = 0;
13187 set_forced_breakpoint($i);
13191 # Force breakpoints at certain tokens in long lines.
13192 # Note that such breakpoints will be undone later if these tokens
13193 # are fully contained within parens on a line.
13196 # break before a keyword within a line
13200 # if one of these keywords:
13201 && $token =~ /^(if|unless|while|until|for)$/
13203 # but do not break at something like '1 while'
13204 && ( $last_nonblank_type ne 'n' || $i > 2 )
13206 # and let keywords follow a closing 'do' brace
13207 && $last_nonblank_block_type ne 'do'
13212 # or container is broken (by side-comment, etc)
13213 || ( $next_nonblank_token eq '('
13214 && $mate_index_to_go[$i_next_nonblank] < $i )
13218 set_forced_breakpoint( $i - 1 );
13221 # remember locations of '||' and '&&' for possible breaks if we
13222 # decide this is a long logical expression.
13223 if ( $type eq '||' ) {
13224 push @{ $rand_or_list[$depth][2] }, $i;
13225 ++$has_old_logical_breakpoints[$depth]
13226 if ( ( $i == $i_line_start || $i == $i_line_end )
13227 && $rOpts_break_at_old_logical_breakpoints );
13229 elsif ( $type eq '&&' ) {
13230 push @{ $rand_or_list[$depth][3] }, $i;
13231 ++$has_old_logical_breakpoints[$depth]
13232 if ( ( $i == $i_line_start || $i == $i_line_end )
13233 && $rOpts_break_at_old_logical_breakpoints );
13235 elsif ( $type eq 'f' ) {
13236 push @{ $rfor_semicolon_list[$depth] }, $i;
13238 elsif ( $type eq 'k' ) {
13239 if ( $token eq 'and' ) {
13240 push @{ $rand_or_list[$depth][1] }, $i;
13241 ++$has_old_logical_breakpoints[$depth]
13242 if ( ( $i == $i_line_start || $i == $i_line_end )
13243 && $rOpts_break_at_old_logical_breakpoints );
13246 # break immediately at 'or's which are probably not in a logical
13247 # block -- but we will break in logical breaks below so that
13248 # they do not add to the forced_breakpoint_count
13249 elsif ( $token eq 'or' ) {
13250 push @{ $rand_or_list[$depth][0] }, $i;
13251 ++$has_old_logical_breakpoints[$depth]
13252 if ( ( $i == $i_line_start || $i == $i_line_end )
13253 && $rOpts_break_at_old_logical_breakpoints );
13254 if ( $is_logical_container{ $container_type[$depth] } ) {
13257 if ($is_long_line) { set_forced_breakpoint($i) }
13258 elsif ( ( $i == $i_line_start || $i == $i_line_end )
13259 && $rOpts_break_at_old_logical_breakpoints )
13261 $saw_good_breakpoint = 1;
13265 elsif ( $token eq 'if' || $token eq 'unless' ) {
13266 push @{ $rand_or_list[$depth][4] }, $i;
13267 if ( ( $i == $i_line_start || $i == $i_line_end )
13268 && $rOpts_break_at_old_logical_breakpoints )
13270 set_forced_breakpoint($i);
13274 elsif ( $is_assignment{$type} ) {
13275 $i_equals[$depth] = $i;
13278 if ($type_sequence) {
13280 # handle any postponed closing breakpoints
13281 if ( $token =~ /^[\)\]\}\:]$/ ) {
13282 if ( $type eq ':' ) {
13283 $last_colon_sequence_number = $type_sequence;
13285 # TESTING: retain break at a ':' line break
13286 if ( ( $i == $i_line_start || $i == $i_line_end )
13287 && $rOpts_break_at_old_ternary_breakpoints )
13291 set_forced_breakpoint($i);
13293 # break at previous '='
13294 if ( $i_equals[$depth] > 0 ) {
13295 set_forced_breakpoint( $i_equals[$depth] );
13296 $i_equals[$depth] = -1;
13300 if ( defined( $postponed_breakpoint{$type_sequence} ) ) {
13301 my $inc = ( $type eq ':' ) ? 0 : 1;
13302 set_forced_breakpoint( $i - $inc );
13303 delete $postponed_breakpoint{$type_sequence};
13307 # set breaks at ?/: if they will get separated (and are
13308 # not a ?/: chain), or if the '?' is at the end of the
13310 elsif ( $token eq '?' ) {
13311 my $i_colon = $mate_index_to_go[$i];
13313 $i_colon <= 0 # the ':' is not in this batch
13314 || $i == 0 # this '?' is the first token of the line
13316 $max_index_to_go # or this '?' is the last token
13320 # don't break at a '?' if preceded by ':' on
13321 # this line of previous ?/: pair on this line.
13322 # This is an attempt to preserve a chain of ?/:
13323 # expressions (elsif2.t). And don't break if
13324 # this has a side comment.
13325 set_forced_breakpoint($i)
13327 $type_sequence == (
13328 $last_colon_sequence_number +
13329 TYPE_SEQUENCE_INCREMENT
13331 || $tokens_to_go[$max_index_to_go] eq '#'
13333 set_closing_breakpoint($i);
13338 #print "LISTX sees: i=$i type=$type tok=$token block=$block_type depth=$depth\n";
13340 #------------------------------------------------------------
13341 # Handle Increasing Depth..
13343 # prepare for a new list when depth increases
13344 # token $i is a '(','{', or '['
13345 #------------------------------------------------------------
13346 if ( $depth > $current_depth ) {
13348 $breakpoint_stack[$depth] = $forced_breakpoint_count;
13349 $breakpoint_undo_stack[$depth] = $forced_breakpoint_undo_count;
13350 $has_broken_sublist[$depth] = 0;
13351 $identifier_count_stack[$depth] = 0;
13352 $index_before_arrow[$depth] = -1;
13353 $interrupted_list[$depth] = 0;
13354 $item_count_stack[$depth] = 0;
13355 $last_comma_index[$depth] = undef;
13356 $last_dot_index[$depth] = undef;
13357 $last_nonblank_type[$depth] = $last_nonblank_type;
13358 $old_breakpoint_count_stack[$depth] = $old_breakpoint_count;
13359 $opening_structure_index_stack[$depth] = $i;
13360 $rand_or_list[$depth] = [];
13361 $rfor_semicolon_list[$depth] = [];
13362 $i_equals[$depth] = -1;
13363 $want_comma_break[$depth] = 0;
13364 $container_type[$depth] =
13365 ( $last_nonblank_type =~ /^(k|=>|&&|\|\||\?|\:|\.)$/ )
13366 ? $last_nonblank_token
13368 $has_old_logical_breakpoints[$depth] = 0;
13370 # if line ends here then signal closing token to break
13371 if ( $next_nonblank_type eq 'b' || $next_nonblank_type eq '#' )
13373 set_closing_breakpoint($i);
13376 # Not all lists of values should be vertically aligned..
13377 $dont_align[$depth] =
13379 # code BLOCKS are handled at a higher level
13380 ( $block_type ne "" )
13382 # certain paren lists
13383 || ( $type eq '(' ) && (
13385 # it does not usually look good to align a list of
13386 # identifiers in a parameter list, as in:
13387 # my($var1, $var2, ...)
13388 # (This test should probably be refined, for now I'm just
13389 # testing for any keyword)
13390 ( $last_nonblank_type eq 'k' )
13392 # a trailing '(' usually indicates a non-list
13393 || ( $next_nonblank_type eq '(' )
13396 # patch to outdent opening brace of long if/for/..
13397 # statements (like this one). See similar coding in
13398 # set_continuation breaks. We have also catch it here for
13399 # short line fragments which otherwise will not go through
13400 # set_continuation_breaks.
13404 # if we have the ')' but not its '(' in this batch..
13405 && ( $last_nonblank_token eq ')' )
13406 && $mate_index_to_go[$i_last_nonblank_token] < 0
13408 # and user wants brace to left
13409 && !$rOpts->{'opening-brace-always-on-right'}
13411 && ( $type eq '{' ) # should be true
13412 && ( $token eq '{' ) # should be true
13415 set_forced_breakpoint( $i - 1 );
13419 #------------------------------------------------------------
13420 # Handle Decreasing Depth..
13422 # finish off any old list when depth decreases
13423 # token $i is a ')','}', or ']'
13424 #------------------------------------------------------------
13425 elsif ( $depth < $current_depth ) {
13427 check_for_new_minimum_depth($depth);
13429 # force all outer logical containers to break after we see on
13431 $has_old_logical_breakpoints[$depth] ||=
13432 $has_old_logical_breakpoints[$current_depth];
13434 # Patch to break between ') {' if the paren list is broken.
13435 # There is similar logic in set_continuation_breaks for
13436 # non-broken lists.
13438 && $next_nonblank_block_type
13439 && $interrupted_list[$current_depth]
13440 && $next_nonblank_type eq '{'
13441 && !$rOpts->{'opening-brace-always-on-right'} )
13443 set_forced_breakpoint($i);
13446 #print "LISTY sees: i=$i type=$type tok=$token block=$block_type depth=$depth next=$next_nonblank_type next_block=$next_nonblank_block_type inter=$interrupted_list[$current_depth]\n";
13448 # set breaks at commas if necessary
13449 my ( $bp_count, $do_not_break_apart ) =
13450 set_comma_breakpoints($current_depth);
13452 my $i_opening = $opening_structure_index_stack[$current_depth];
13453 my $saw_opening_structure = ( $i_opening >= 0 );
13455 # this term is long if we had to break at interior commas..
13456 my $is_long_term = $bp_count > 0;
13458 # ..or if the length between opening and closing parens exceeds
13459 # allowed line length
13460 if ( !$is_long_term && $saw_opening_structure ) {
13461 my $i_opening_minus = find_token_starting_list($i_opening);
13463 # Note: we have to allow for one extra space after a
13464 # closing token so that we do not strand a comma or
13465 # semicolon, hence the '>=' here (oneline.t)
13467 excess_line_length( $i_opening_minus, $i ) >= 0;
13470 # We've set breaks after all comma-arrows. Now we have to
13471 # undo them if this can be a one-line block
13472 # (the only breakpoints set will be due to comma-arrows)
13475 # user doesn't require breaking after all comma-arrows
13476 ( $rOpts_comma_arrow_breakpoints != 0 )
13478 # and if the opening structure is in this batch
13479 && $saw_opening_structure
13481 # and either on the same old line
13483 $old_breakpoint_count_stack[$current_depth] ==
13484 $last_old_breakpoint_count
13486 # or user wants to form long blocks with arrows
13487 || $rOpts_comma_arrow_breakpoints == 2
13490 # and we made some breakpoints between the opening and closing
13491 && ( $breakpoint_undo_stack[$current_depth] <
13492 $forced_breakpoint_undo_count )
13494 # and this block is short enough to fit on one line
13495 # Note: use < because need 1 more space for possible comma
13500 undo_forced_breakpoint_stack(
13501 $breakpoint_undo_stack[$current_depth] );
13504 # now see if we have any comma breakpoints left
13505 my $has_comma_breakpoints =
13506 ( $breakpoint_stack[$current_depth] !=
13507 $forced_breakpoint_count );
13509 # update broken-sublist flag of the outer container
13510 $has_broken_sublist[$depth] = $has_broken_sublist[$depth]
13511 || $has_broken_sublist[$current_depth]
13513 || $has_comma_breakpoints;
13515 # Having come to the closing ')', '}', or ']', now we have to decide if we
13516 # should 'open up' the structure by placing breaks at the opening and
13517 # closing containers. This is a tricky decision. Here are some of the
13518 # basic considerations:
13520 # -If this is a BLOCK container, then any breakpoints will have already
13521 # been set (and according to user preferences), so we need do nothing here.
13523 # -If we have a comma-separated list for which we can align the list items,
13524 # then we need to do so because otherwise the vertical aligner cannot
13525 # currently do the alignment.
13527 # -If this container does itself contain a container which has been broken
13528 # open, then it should be broken open to properly show the structure.
13530 # -If there is nothing to align, and no other reason to break apart,
13531 # then do not do it.
13533 # We will not break open the parens of a long but 'simple' logical expression.
13536 # This is an example of a simple logical expression and its formatting:
13538 # if ( $bigwasteofspace1 && $bigwasteofspace2
13539 # || $bigwasteofspace3 && $bigwasteofspace4 )
13541 # Most people would prefer this than the 'spacey' version:
13544 # $bigwasteofspace1 && $bigwasteofspace2
13545 # || $bigwasteofspace3 && $bigwasteofspace4
13548 # To illustrate the rules for breaking logical expressions, consider:
13552 # and ( exists $ids_excl_uc{$id_uc}
13553 # or grep $id_uc =~ /$_/, @ids_excl_uc ))
13555 # This is on the verge of being difficult to read. The current default is to
13556 # open it up like this:
13561 # and ( exists $ids_excl_uc{$id_uc}
13562 # or grep $id_uc =~ /$_/, @ids_excl_uc )
13565 # This is a compromise which tries to avoid being too dense and to spacey.
13566 # A more spaced version would be:
13572 # exists $ids_excl_uc{$id_uc}
13573 # or grep $id_uc =~ /$_/, @ids_excl_uc
13577 # Some people might prefer the spacey version -- an option could be added. The
13578 # innermost expression contains a long block '( exists $ids_... ')'.
13580 # Here is how the logic goes: We will force a break at the 'or' that the
13581 # innermost expression contains, but we will not break apart its opening and
13582 # closing containers because (1) it contains no multi-line sub-containers itself,
13583 # and (2) there is no alignment to be gained by breaking it open like this
13586 # exists $ids_excl_uc{$id_uc}
13587 # or grep $id_uc =~ /$_/, @ids_excl_uc
13590 # (although this looks perfectly ok and might be good for long expressions). The
13591 # outer 'if' container, though, contains a broken sub-container, so it will be
13592 # broken open to avoid too much density. Also, since it contains no 'or's, there
13593 # will be a forced break at its 'and'.
13595 # set some flags telling something about this container..
13596 my $is_simple_logical_expression = 0;
13597 if ( $item_count_stack[$current_depth] == 0
13598 && $saw_opening_structure
13599 && $tokens_to_go[$i_opening] eq '('
13600 && $is_logical_container{ $container_type[$current_depth] }
13604 # This seems to be a simple logical expression with
13605 # no existing breakpoints. Set a flag to prevent
13607 if ( !$has_comma_breakpoints ) {
13608 $is_simple_logical_expression = 1;
13611 # This seems to be a simple logical expression with
13612 # breakpoints (broken sublists, for example). Break
13613 # at all 'or's and '||'s.
13615 set_logical_breakpoints($current_depth);
13620 && @{ $rfor_semicolon_list[$current_depth] } )
13622 set_for_semicolon_breakpoints($current_depth);
13624 # open up a long 'for' or 'foreach' container to allow
13625 # leading term alignment unless -lp is used.
13626 $has_comma_breakpoints = 1
13627 unless $rOpts_line_up_parentheses;
13632 # breaks for code BLOCKS are handled at a higher level
13635 # we do not need to break at the top level of an 'if'
13637 && !$is_simple_logical_expression
13639 ## modification to keep ': (' containers vertically tight;
13640 ## but probably better to let user set -vt=1 to avoid
13641 ## inconsistency with other paren types
13642 ## && ($container_type[$current_depth] ne ':')
13644 # otherwise, we require one of these reasons for breaking:
13647 # - this term has forced line breaks
13648 $has_comma_breakpoints
13650 # - the opening container is separated from this batch
13651 # for some reason (comment, blank line, code block)
13652 # - this is a non-paren container spanning multiple lines
13653 || !$saw_opening_structure
13655 # - this is a long block contained in another breakable
13658 && $container_environment_to_go[$i_opening] ne
13664 # For -lp option, we must put a breakpoint before
13665 # the token which has been identified as starting
13666 # this indentation level. This is necessary for
13667 # proper alignment.
13668 if ( $rOpts_line_up_parentheses && $saw_opening_structure )
13670 my $item = $leading_spaces_to_go[ $i_opening + 1 ];
13671 if ( $i_opening + 1 < $max_index_to_go
13672 && $types_to_go[ $i_opening + 1 ] eq 'b' )
13674 $item = $leading_spaces_to_go[ $i_opening + 2 ];
13676 if ( defined($item) ) {
13677 my $i_start_2 = $item->get_STARTING_INDEX();
13679 defined($i_start_2)
13681 # we are breaking after an opening brace, paren,
13682 # so don't break before it too
13683 && $i_start_2 ne $i_opening
13687 # Only break for breakpoints at the same
13688 # indentation level as the opening paren
13689 my $test1 = $nesting_depth_to_go[$i_opening];
13690 my $test2 = $nesting_depth_to_go[$i_start_2];
13691 if ( $test2 == $test1 ) {
13692 set_forced_breakpoint( $i_start_2 - 1 );
13698 # break after opening structure.
13699 # note: break before closing structure will be automatic
13700 if ( $minimum_depth <= $current_depth ) {
13702 set_forced_breakpoint($i_opening)
13703 unless ( $do_not_break_apart
13704 || is_unbreakable_container($current_depth) );
13706 # break at '.' of lower depth level before opening token
13707 if ( $last_dot_index[$depth] ) {
13708 set_forced_breakpoint( $last_dot_index[$depth] );
13711 # break before opening structure if preeced by another
13712 # closing structure and a comma. This is normally
13713 # done by the previous closing brace, but not
13714 # if it was a one-line block.
13715 if ( $i_opening > 2 ) {
13717 ( $types_to_go[ $i_opening - 1 ] eq 'b' )
13721 if ( $types_to_go[$i_prev] eq ','
13722 && $types_to_go[ $i_prev - 1 ] =~ /^[\)\}]$/ )
13724 set_forced_breakpoint($i_prev);
13727 # also break before something like ':(' or '?('
13730 $types_to_go[$i_prev] =~ /^([k\:\?]|&&|\|\|)$/ )
13732 my $token_prev = $tokens_to_go[$i_prev];
13733 if ( $want_break_before{$token_prev} ) {
13734 set_forced_breakpoint($i_prev);
13740 # break after comma following closing structure
13741 if ( $next_type eq ',' ) {
13742 set_forced_breakpoint( $i + 1 );
13745 # break before an '=' following closing structure
13747 $is_assignment{$next_nonblank_type}
13748 && ( $breakpoint_stack[$current_depth] !=
13749 $forced_breakpoint_count )
13752 set_forced_breakpoint($i);
13755 # break at any comma before the opening structure Added
13756 # for -lp, but seems to be good in general. It isn't
13757 # obvious how far back to look; the '5' below seems to
13758 # work well and will catch the comma in something like
13759 # push @list, myfunc( $param, $param, ..
13761 my $icomma = $last_comma_index[$depth];
13762 if ( defined($icomma) && ( $i_opening - $icomma ) < 5 ) {
13763 unless ( $forced_breakpoint_to_go[$icomma] ) {
13764 set_forced_breakpoint($icomma);
13767 } # end logic to open up a container
13769 # Break open a logical container open if it was already open
13770 elsif ($is_simple_logical_expression
13771 && $has_old_logical_breakpoints[$current_depth] )
13773 set_logical_breakpoints($current_depth);
13776 # Handle long container which does not get opened up
13777 elsif ($is_long_term) {
13779 # must set fake breakpoint to alert outer containers that
13781 set_fake_breakpoint();
13785 #------------------------------------------------------------
13786 # Handle this token
13787 #------------------------------------------------------------
13789 $current_depth = $depth;
13791 # handle comma-arrow
13792 if ( $type eq '=>' ) {
13793 next if ( $last_nonblank_type eq '=>' );
13794 next if $rOpts_break_at_old_comma_breakpoints;
13795 next if $rOpts_comma_arrow_breakpoints == 3;
13796 $want_comma_break[$depth] = 1;
13797 $index_before_arrow[$depth] = $i_last_nonblank_token;
13801 elsif ( $type eq '.' ) {
13802 $last_dot_index[$depth] = $i;
13805 # Turn off alignment if we are sure that this is not a list
13806 # environment. To be safe, we will do this if we see certain
13807 # non-list tokens, such as ';', and also the environment is
13808 # not a list. Note that '=' could be in any of the = operators
13809 # (lextest.t). We can't just use the reported environment
13810 # because it can be incorrect in some cases.
13811 elsif ( ( $type =~ /^[\;\<\>\~]$/ || $is_assignment{$type} )
13812 && $container_environment_to_go[$i] ne 'LIST' )
13814 $dont_align[$depth] = 1;
13815 $want_comma_break[$depth] = 0;
13816 $index_before_arrow[$depth] = -1;
13819 # now just handle any commas
13820 next unless ( $type eq ',' );
13822 $last_dot_index[$depth] = undef;
13823 $last_comma_index[$depth] = $i;
13825 # break here if this comma follows a '=>'
13826 # but not if there is a side comment after the comma
13827 if ( $want_comma_break[$depth] ) {
13829 if ( $next_nonblank_type =~ /^[\)\}\]R]$/ ) {
13830 $want_comma_break[$depth] = 0;
13831 $index_before_arrow[$depth] = -1;
13835 set_forced_breakpoint($i) unless ( $next_nonblank_type eq '#' );
13837 # break before the previous token if it looks safe
13838 # Example of something that we will not try to break before:
13839 # DBI::SQL_SMALLINT() => $ado_consts->{adSmallInt},
13840 # Also we don't want to break at a binary operator (like +):
13844 # $y - $R, -fill => 'black',
13846 my $ibreak = $index_before_arrow[$depth] - 1;
13848 && $tokens_to_go[ $ibreak + 1 ] !~ /^[\)\}\]]$/ )
13850 if ( $tokens_to_go[$ibreak] eq '-' ) { $ibreak-- }
13851 if ( $types_to_go[$ibreak] eq 'b' ) { $ibreak-- }
13852 if ( $types_to_go[$ibreak] =~ /^[,wiZCUG\(\{\[]$/ ) {
13854 # don't break pointer calls, such as the following:
13855 # File::Spec->curdir => 1,
13856 # (This is tokenized as adjacent 'w' tokens)
13857 if ( $tokens_to_go[ $ibreak + 1 ] !~ /^->/ ) {
13858 set_forced_breakpoint($ibreak);
13863 $want_comma_break[$depth] = 0;
13864 $index_before_arrow[$depth] = -1;
13866 # handle list which mixes '=>'s and ','s:
13867 # treat any list items so far as an interrupted list
13868 $interrupted_list[$depth] = 1;
13872 # skip past these commas if we are not supposed to format them
13873 next if ( $dont_align[$depth] );
13875 # break after all commas above starting depth
13876 if ( $depth < $starting_depth ) {
13877 set_forced_breakpoint($i) unless ( $next_nonblank_type eq '#' );
13881 # add this comma to the list..
13882 my $item_count = $item_count_stack[$depth];
13883 if ( $item_count == 0 ) {
13885 # but do not form a list with no opening structure
13888 # open INFILE_COPY, ">$input_file_copy"
13889 # or die ("very long message");
13891 if ( ( $opening_structure_index_stack[$depth] < 0 )
13892 && $container_environment_to_go[$i] eq 'BLOCK' )
13894 $dont_align[$depth] = 1;
13899 $comma_index[$depth][$item_count] = $i;
13900 ++$item_count_stack[$depth];
13901 if ( $last_nonblank_type =~ /^[iR\]]$/ ) {
13902 $identifier_count_stack[$depth]++;
13906 #-------------------------------------------
13907 # end of loop over all tokens in this batch
13908 #-------------------------------------------
13910 # set breaks for any unfinished lists ..
13911 for ( my $dd = $current_depth ; $dd >= $minimum_depth ; $dd-- ) {
13913 $interrupted_list[$dd] = 1;
13914 $has_broken_sublist[$dd] = 1 if ( $dd < $current_depth );
13915 set_comma_breakpoints($dd);
13916 set_logical_breakpoints($dd)
13917 if ( $has_old_logical_breakpoints[$dd] );
13918 set_for_semicolon_breakpoints($dd);
13920 # break open container...
13921 my $i_opening = $opening_structure_index_stack[$dd];
13922 set_forced_breakpoint($i_opening)
13924 is_unbreakable_container($dd)
13926 # Avoid a break which would place an isolated ' or "
13929 && $i_opening >= $max_index_to_go - 2
13930 && $token =~ /^['"]$/ )
13934 # Return a flag indicating if the input file had some good breakpoints.
13935 # This flag will be used to force a break in a line shorter than the
13936 # allowed line length.
13937 if ( $has_old_logical_breakpoints[$current_depth] ) {
13938 $saw_good_breakpoint = 1;
13940 return $saw_good_breakpoint;
13944 sub find_token_starting_list {
13946 # When testing to see if a block will fit on one line, some
13947 # previous token(s) may also need to be on the line; particularly
13948 # if this is a sub call. So we will look back at least one
13949 # token. NOTE: This isn't perfect, but not critical, because
13950 # if we mis-identify a block, it will be wrapped and therefore
13951 # fixed the next time it is formatted.
13952 my $i_opening_paren = shift;
13953 my $i_opening_minus = $i_opening_paren;
13954 my $im1 = $i_opening_paren - 1;
13955 my $im2 = $i_opening_paren - 2;
13956 my $im3 = $i_opening_paren - 3;
13957 my $typem1 = $types_to_go[$im1];
13958 my $typem2 = $im2 >= 0 ? $types_to_go[$im2] : 'b';
13959 if ( $typem1 eq ',' || ( $typem1 eq 'b' && $typem2 eq ',' ) ) {
13960 $i_opening_minus = $i_opening_paren;
13962 elsif ( $tokens_to_go[$i_opening_paren] eq '(' ) {
13963 $i_opening_minus = $im1 if $im1 >= 0;
13965 # walk back to improve length estimate
13966 for ( my $j = $im1 ; $j >= 0 ; $j-- ) {
13967 last if ( $types_to_go[$j] =~ /^[\(\[\{L\}\]\)Rb,]$/ );
13968 $i_opening_minus = $j;
13970 if ( $types_to_go[$i_opening_minus] eq 'b' ) { $i_opening_minus++ }
13972 elsif ( $typem1 eq 'k' ) { $i_opening_minus = $im1 }
13973 elsif ( $typem1 eq 'b' && $im2 >= 0 && $types_to_go[$im2] eq 'k' ) {
13974 $i_opening_minus = $im2;
13976 return $i_opening_minus;
13979 { # begin set_comma_breakpoints_do
13981 my %is_keyword_with_special_leading_term;
13985 # These keywords have prototypes which allow a special leading item
13986 # followed by a list
13988 qw(formline grep kill map printf sprintf push chmod join pack unshift);
13989 @is_keyword_with_special_leading_term{@_} = (1) x scalar(@_);
13992 sub set_comma_breakpoints_do {
13994 # Given a list with some commas, set breakpoints at some of the
13995 # commas, if necessary, to make it easy to read. This list is
13998 $depth, $i_opening_paren, $i_closing_paren,
13999 $item_count, $identifier_count, $rcomma_index,
14000 $next_nonblank_type, $list_type, $interrupted,
14001 $rdo_not_break_apart, $must_break_open,
14004 # nothing to do if no commas seen
14005 return if ( $item_count < 1 );
14006 my $i_first_comma = $$rcomma_index[0];
14007 my $i_true_last_comma = $$rcomma_index[ $item_count - 1 ];
14008 my $i_last_comma = $i_true_last_comma;
14009 if ( $i_last_comma >= $max_index_to_go ) {
14010 $i_last_comma = $$rcomma_index[ --$item_count - 1 ];
14011 return if ( $item_count < 1 );
14014 #---------------------------------------------------------------
14015 # find lengths of all items in the list to calculate page layout
14016 #---------------------------------------------------------------
14017 my $comma_count = $item_count;
14023 my @max_length = ( 0, 0 );
14024 my $first_term_length;
14025 my $i = $i_opening_paren;
14028 for ( my $j = 0 ; $j < $comma_count ; $j++ ) {
14029 $is_odd = 1 - $is_odd;
14030 $i_prev_plus = $i + 1;
14031 $i = $$rcomma_index[$j];
14034 ( $types_to_go[ $i - 1 ] eq 'b' ) ? $i - 2 : $i - 1;
14036 ( $types_to_go[$i_prev_plus] eq 'b' )
14039 push @i_term_begin, $i_term_begin;
14040 push @i_term_end, $i_term_end;
14041 push @i_term_comma, $i;
14043 # note: currently adding 2 to all lengths (for comma and space)
14045 2 + token_sequence_length( $i_term_begin, $i_term_end );
14046 push @item_lengths, $length;
14049 $first_term_length = $length;
14053 if ( $length > $max_length[$is_odd] ) {
14054 $max_length[$is_odd] = $length;
14059 # now we have to make a distinction between the comma count and item
14060 # count, because the item count will be one greater than the comma
14061 # count if the last item is not terminated with a comma
14063 ( $types_to_go[ $i_last_comma + 1 ] eq 'b' )
14064 ? $i_last_comma + 1
14067 ( $types_to_go[ $i_closing_paren - 1 ] eq 'b' )
14068 ? $i_closing_paren - 2
14069 : $i_closing_paren - 1;
14070 my $i_effective_last_comma = $i_last_comma;
14072 my $last_item_length = token_sequence_length( $i_b + 1, $i_e );
14074 if ( $last_item_length > 0 ) {
14076 # add 2 to length because other lengths include a comma and a blank
14077 $last_item_length += 2;
14078 push @item_lengths, $last_item_length;
14079 push @i_term_begin, $i_b + 1;
14080 push @i_term_end, $i_e;
14081 push @i_term_comma, undef;
14083 my $i_odd = $item_count % 2;
14085 if ( $last_item_length > $max_length[$i_odd] ) {
14086 $max_length[$i_odd] = $last_item_length;
14090 $i_effective_last_comma = $i_e + 1;
14092 if ( $types_to_go[ $i_b + 1 ] =~ /^[iR\]]$/ ) {
14093 $identifier_count++;
14097 #---------------------------------------------------------------
14098 # End of length calculations
14099 #---------------------------------------------------------------
14101 #---------------------------------------------------------------
14102 # Compound List Rule 1:
14103 # Break at (almost) every comma for a list containing a broken
14104 # sublist. This has higher priority than the Interrupted List
14106 #---------------------------------------------------------------
14107 if ( $has_broken_sublist[$depth] ) {
14109 # Break at every comma except for a comma between two
14110 # simple, small terms. This prevents long vertical
14111 # columns of, say, just 0's.
14112 my $small_length = 10; # 2 + actual maximum length wanted
14114 # We'll insert a break in long runs of small terms to
14115 # allow alignment in uniform tables.
14116 my $skipped_count = 0;
14117 my $columns = table_columns_available($i_first_comma);
14118 my $fields = int( $columns / $small_length );
14119 if ( $rOpts_maximum_fields_per_table
14120 && $fields > $rOpts_maximum_fields_per_table )
14122 $fields = $rOpts_maximum_fields_per_table;
14124 my $max_skipped_count = $fields - 1;
14126 my $is_simple_last_term = 0;
14127 my $is_simple_next_term = 0;
14128 foreach my $j ( 0 .. $item_count ) {
14129 $is_simple_last_term = $is_simple_next_term;
14130 $is_simple_next_term = 0;
14131 if ( $j < $item_count
14132 && $i_term_end[$j] == $i_term_begin[$j]
14133 && $item_lengths[$j] <= $small_length )
14135 $is_simple_next_term = 1;
14138 if ( $is_simple_last_term
14139 && $is_simple_next_term
14140 && $skipped_count < $max_skipped_count )
14145 $skipped_count = 0;
14146 my $i = $i_term_comma[ $j - 1 ];
14147 last unless defined $i;
14148 set_forced_breakpoint($i);
14152 # always break at the last comma if this list is
14153 # interrupted; we wouldn't want to leave a terminal '{', for
14155 if ($interrupted) { set_forced_breakpoint($i_true_last_comma) }
14159 #my ( $a, $b, $c ) = caller();
14160 #print "LISTX: in set_list $a $c interupt=$interrupted count=$item_count
14161 #i_first = $i_first_comma i_last=$i_last_comma max=$max_index_to_go\n";
14162 #print "depth=$depth has_broken=$has_broken_sublist[$depth] is_multi=$is_multiline opening_paren=($i_opening_paren) \n";
14164 #---------------------------------------------------------------
14165 # Interrupted List Rule:
14166 # A list is is forced to use old breakpoints if it was interrupted
14167 # by side comments or blank lines, or requested by user.
14168 #---------------------------------------------------------------
14169 if ( $rOpts_break_at_old_comma_breakpoints
14171 || $i_opening_paren < 0 )
14173 copy_old_breakpoints( $i_first_comma, $i_true_last_comma );
14177 #---------------------------------------------------------------
14178 # Looks like a list of items. We have to look at it and size it up.
14179 #---------------------------------------------------------------
14181 my $opening_token = $tokens_to_go[$i_opening_paren];
14182 my $opening_environment =
14183 $container_environment_to_go[$i_opening_paren];
14185 #-------------------------------------------------------------------
14186 # Return if this will fit on one line
14187 #-------------------------------------------------------------------
14189 my $i_opening_minus = find_token_starting_list($i_opening_paren);
14191 unless excess_line_length( $i_opening_minus, $i_closing_paren ) > 0;
14193 #-------------------------------------------------------------------
14194 # Now we know that this block spans multiple lines; we have to set
14195 # at least one breakpoint -- real or fake -- as a signal to break
14196 # open any outer containers.
14197 #-------------------------------------------------------------------
14198 set_fake_breakpoint();
14200 # be sure we do not extend beyond the current list length
14201 if ( $i_effective_last_comma >= $max_index_to_go ) {
14202 $i_effective_last_comma = $max_index_to_go - 1;
14205 # Set a flag indicating if we need to break open to keep -lp
14206 # items aligned. This is necessary if any of the list terms
14207 # exceeds the available space after the '('.
14208 my $need_lp_break_open = $must_break_open;
14209 if ( $rOpts_line_up_parentheses && !$must_break_open ) {
14210 my $columns_if_unbroken = $rOpts_maximum_line_length -
14211 total_line_length( $i_opening_minus, $i_opening_paren );
14212 $need_lp_break_open = ( $max_length[0] > $columns_if_unbroken )
14213 || ( $max_length[1] > $columns_if_unbroken )
14214 || ( $first_term_length > $columns_if_unbroken );
14217 # Specify if the list must have an even number of fields or not.
14218 # It is generally safest to assume an even number, because the
14219 # list items might be a hash list. But if we can be sure that
14220 # it is not a hash, then we can allow an odd number for more
14222 my $odd_or_even = 2; # 1 = odd field count ok, 2 = want even count
14224 if ( $identifier_count >= $item_count - 1
14225 || $is_assignment{$next_nonblank_type}
14226 || ( $list_type && $list_type ne '=>' && $list_type !~ /^[\:\?]$/ )
14232 # do we have a long first term which should be
14233 # left on a line by itself?
14234 my $use_separate_first_term = (
14235 $odd_or_even == 1 # only if we can use 1 field/line
14236 && $item_count > 3 # need several items
14237 && $first_term_length >
14238 2 * $max_length[0] - 2 # need long first term
14239 && $first_term_length >
14240 2 * $max_length[1] - 2 # need long first term
14243 # or do we know from the type of list that the first term should
14245 if ( !$use_separate_first_term ) {
14246 if ( $is_keyword_with_special_leading_term{$list_type} ) {
14247 $use_separate_first_term = 1;
14249 # should the container be broken open?
14250 if ( $item_count < 3 ) {
14251 if ( $i_first_comma - $i_opening_paren < 4 ) {
14252 $$rdo_not_break_apart = 1;
14255 elsif ($first_term_length < 20
14256 && $i_first_comma - $i_opening_paren < 4 )
14258 my $columns = table_columns_available($i_first_comma);
14259 if ( $first_term_length < $columns ) {
14260 $$rdo_not_break_apart = 1;
14267 if ($use_separate_first_term) {
14269 # ..set a break and update starting values
14270 $use_separate_first_term = 1;
14271 set_forced_breakpoint($i_first_comma);
14272 $i_opening_paren = $i_first_comma;
14273 $i_first_comma = $$rcomma_index[1];
14275 return if $comma_count == 1;
14276 shift @item_lengths;
14277 shift @i_term_begin;
14279 shift @i_term_comma;
14282 # if not, update the metrics to include the first term
14284 if ( $first_term_length > $max_length[0] ) {
14285 $max_length[0] = $first_term_length;
14289 # Field width parameters
14290 my $pair_width = ( $max_length[0] + $max_length[1] );
14292 ( $max_length[0] > $max_length[1] ) ? $max_length[0] : $max_length[1];
14294 # Number of free columns across the page width for laying out tables
14295 my $columns = table_columns_available($i_first_comma);
14297 # Estimated maximum number of fields which fit this space
14298 # This will be our first guess
14299 my $number_of_fields_max =
14300 maximum_number_of_fields( $columns, $odd_or_even, $max_width,
14302 my $number_of_fields = $number_of_fields_max;
14304 # Find the best-looking number of fields
14305 # and make this our second guess if possible
14306 my ( $number_of_fields_best, $ri_ragged_break_list,
14307 $new_identifier_count )
14308 = study_list_complexity( \@i_term_begin, \@i_term_end, \@item_lengths,
14311 if ( $number_of_fields_best != 0
14312 && $number_of_fields_best < $number_of_fields_max )
14314 $number_of_fields = $number_of_fields_best;
14317 # ----------------------------------------------------------------------
14318 # If we are crowded and the -lp option is being used, try to
14319 # undo some indentation
14320 # ----------------------------------------------------------------------
14322 $rOpts_line_up_parentheses
14324 $number_of_fields == 0
14325 || ( $number_of_fields == 1
14326 && $number_of_fields != $number_of_fields_best )
14330 my $available_spaces = get_AVAILABLE_SPACES_to_go($i_first_comma);
14331 if ( $available_spaces > 0 ) {
14333 my $spaces_wanted = $max_width - $columns; # for 1 field
14335 if ( $number_of_fields_best == 0 ) {
14336 $number_of_fields_best =
14337 get_maximum_fields_wanted( \@item_lengths );
14340 if ( $number_of_fields_best != 1 ) {
14341 my $spaces_wanted_2 =
14342 1 + $pair_width - $columns; # for 2 fields
14343 if ( $available_spaces > $spaces_wanted_2 ) {
14344 $spaces_wanted = $spaces_wanted_2;
14348 if ( $spaces_wanted > 0 ) {
14349 my $deleted_spaces =
14350 reduce_lp_indentation( $i_first_comma, $spaces_wanted );
14353 if ( $deleted_spaces > 0 ) {
14354 $columns = table_columns_available($i_first_comma);
14355 $number_of_fields_max =
14356 maximum_number_of_fields( $columns, $odd_or_even,
14357 $max_width, $pair_width );
14358 $number_of_fields = $number_of_fields_max;
14360 if ( $number_of_fields_best == 1
14361 && $number_of_fields >= 1 )
14363 $number_of_fields = $number_of_fields_best;
14370 # try for one column if two won't work
14371 if ( $number_of_fields <= 0 ) {
14372 $number_of_fields = int( $columns / $max_width );
14375 # The user can place an upper bound on the number of fields,
14376 # which can be useful for doing maintenance on tables
14377 if ( $rOpts_maximum_fields_per_table
14378 && $number_of_fields > $rOpts_maximum_fields_per_table )
14380 $number_of_fields = $rOpts_maximum_fields_per_table;
14383 # How many columns (characters) and lines would this container take
14384 # if no additional whitespace were added?
14385 my $packed_columns = token_sequence_length( $i_opening_paren + 1,
14386 $i_effective_last_comma + 1 );
14387 if ( $columns <= 0 ) { $columns = 1 } # avoid divide by zero
14388 my $packed_lines = 1 + int( $packed_columns / $columns );
14390 # are we an item contained in an outer list?
14391 my $in_hierarchical_list = $next_nonblank_type =~ /^[\}\,]$/;
14393 if ( $number_of_fields <= 0 ) {
14395 # #---------------------------------------------------------------
14396 # # We're in trouble. We can't find a single field width that works.
14397 # # There is no simple answer here; we may have a single long list
14399 # #---------------------------------------------------------------
14401 # In many cases, it may be best to not force a break if there is just one
14402 # comma, because the standard continuation break logic will do a better
14405 # In the common case that all but one of the terms can fit
14406 # on a single line, it may look better not to break open the
14407 # containing parens. Consider, for example
14411 # sort { $color_value{$::a} <=> $color_value{$::b}; }
14414 # which will look like this with the container broken:
14418 # sort { $color_value{$::a} <=> $color_value{$::b}; } keys %colors
14421 # Here is an example of this rule for a long last term:
14423 # log_message( 0, 256, 128,
14424 # "Number of routes in adj-RIB-in to be considered: $peercount" );
14426 # And here is an example with a long first term:
14429 # "%2d wallclock secs (%$f usr %$f sys + %$f cusr %$f csys = %$f CPU)",
14430 # $r, $pu, $ps, $cu, $cs, $tt
14432 # if $style eq 'all';
14434 my $i_last_comma = $$rcomma_index[ $comma_count - 1 ];
14435 my $long_last_term = excess_line_length( 0, $i_last_comma ) <= 0;
14436 my $long_first_term =
14437 excess_line_length( $i_first_comma + 1, $max_index_to_go ) <= 0;
14439 # break at every comma ...
14442 # if requested by user or is best looking
14443 $number_of_fields_best == 1
14445 # or if this is a sublist of a larger list
14446 || $in_hierarchical_list
14448 # or if multiple commas and we dont have a long first or last
14450 || ( $comma_count > 1
14451 && !( $long_last_term || $long_first_term ) )
14454 foreach ( 0 .. $comma_count - 1 ) {
14455 set_forced_breakpoint( $$rcomma_index[$_] );
14458 elsif ($long_last_term) {
14460 set_forced_breakpoint($i_last_comma);
14461 $$rdo_not_break_apart = 1 unless $must_break_open;
14463 elsif ($long_first_term) {
14465 set_forced_breakpoint($i_first_comma);
14469 # let breaks be defined by default bond strength logic
14474 # --------------------------------------------------------
14475 # We have a tentative field count that seems to work.
14476 # How many lines will this require?
14477 # --------------------------------------------------------
14478 my $formatted_lines = $item_count / ($number_of_fields);
14479 if ( $formatted_lines != int $formatted_lines ) {
14480 $formatted_lines = 1 + int $formatted_lines;
14483 # So far we've been trying to fill out to the right margin. But
14484 # compact tables are easier to read, so let's see if we can use fewer
14485 # fields without increasing the number of lines.
14486 $number_of_fields =
14487 compactify_table( $item_count, $number_of_fields, $formatted_lines,
14490 # How many spaces across the page will we fill?
14491 my $columns_per_line =
14492 ( int $number_of_fields / 2 ) * $pair_width +
14493 ( $number_of_fields % 2 ) * $max_width;
14495 my $formatted_columns;
14497 if ( $number_of_fields > 1 ) {
14498 $formatted_columns =
14499 ( $pair_width * ( int( $item_count / 2 ) ) +
14500 ( $item_count % 2 ) * $max_width );
14503 $formatted_columns = $max_width * $item_count;
14505 if ( $formatted_columns < $packed_columns ) {
14506 $formatted_columns = $packed_columns;
14509 my $unused_columns = $formatted_columns - $packed_columns;
14511 # set some empirical parameters to help decide if we should try to
14512 # align; high sparsity does not look good, especially with few lines
14513 my $sparsity = ($unused_columns) / ($formatted_columns);
14514 my $max_allowed_sparsity =
14515 ( $item_count < 3 ) ? 0.1
14516 : ( $packed_lines == 1 ) ? 0.15
14517 : ( $packed_lines == 2 ) ? 0.4
14520 # Begin check for shortcut methods, which avoid treating a list
14521 # as a table for relatively small parenthesized lists. These
14522 # are usually easier to read if not formatted as tables.
14524 $packed_lines <= 2 # probably can fit in 2 lines
14525 && $item_count < 9 # doesn't have too many items
14526 && $opening_environment eq 'BLOCK' # not a sub-container
14527 && $opening_token eq '(' # is paren list
14531 # Shortcut method 1: for -lp and just one comma:
14532 # This is a no-brainer, just break at the comma.
14534 $rOpts_line_up_parentheses # -lp
14535 && $item_count == 2 # two items, one comma
14536 && !$must_break_open
14539 my $i_break = $$rcomma_index[0];
14540 set_forced_breakpoint($i_break);
14541 $$rdo_not_break_apart = 1;
14542 set_non_alignment_flags( $comma_count, $rcomma_index );
14547 # method 2 is for most small ragged lists which might look
14548 # best if not displayed as a table.
14550 ( $number_of_fields == 2 && $item_count == 3 )
14552 $new_identifier_count > 0 # isn't all quotes
14553 && $sparsity > 0.15
14554 ) # would be fairly spaced gaps if aligned
14558 my $break_count = set_ragged_breakpoints( \@i_term_comma,
14559 $ri_ragged_break_list );
14560 ++$break_count if ($use_separate_first_term);
14562 # NOTE: we should really use the true break count here,
14563 # which can be greater if there are large terms and
14564 # little space, but usually this will work well enough.
14565 unless ($must_break_open) {
14567 if ( $break_count <= 1 ) {
14568 $$rdo_not_break_apart = 1;
14570 elsif ( $rOpts_line_up_parentheses && !$need_lp_break_open )
14572 $$rdo_not_break_apart = 1;
14575 set_non_alignment_flags( $comma_count, $rcomma_index );
14579 } # end shortcut methods
14583 FORMATTER_DEBUG_FLAG_SPARSE && do {
14585 "SPARSE:cols=$columns commas=$comma_count items:$item_count ids=$identifier_count pairwidth=$pair_width fields=$number_of_fields lines packed: $packed_lines packed_cols=$packed_columns fmtd:$formatted_lines cols /line:$columns_per_line unused:$unused_columns fmtd:$formatted_columns sparsity=$sparsity allow=$max_allowed_sparsity\n";
14589 #---------------------------------------------------------------
14590 # Compound List Rule 2:
14591 # If this list is too long for one line, and it is an item of a
14592 # larger list, then we must format it, regardless of sparsity
14593 # (ian.t). One reason that we have to do this is to trigger
14594 # Compound List Rule 1, above, which causes breaks at all commas of
14595 # all outer lists. In this way, the structure will be properly
14597 #---------------------------------------------------------------
14599 # Decide if this list is too long for one line unless broken
14600 my $total_columns = table_columns_available($i_opening_paren);
14601 my $too_long = $packed_columns > $total_columns;
14603 # For a paren list, include the length of the token just before the
14604 # '(' because this is likely a sub call, and we would have to
14605 # include the sub name on the same line as the list. This is still
14606 # imprecise, but not too bad. (steve.t)
14607 if ( !$too_long && $i_opening_paren > 0 && $opening_token eq '(' ) {
14609 $too_long = excess_line_length( $i_opening_minus,
14610 $i_effective_last_comma + 1 ) > 0;
14613 # FIXME: For an item after a '=>', try to include the length of the
14614 # thing before the '=>'. This is crude and should be improved by
14615 # actually looking back token by token.
14616 if ( !$too_long && $i_opening_paren > 0 && $list_type eq '=>' ) {
14617 my $i_opening_minus = $i_opening_paren - 4;
14618 if ( $i_opening_minus >= 0 ) {
14619 $too_long = excess_line_length( $i_opening_minus,
14620 $i_effective_last_comma + 1 ) > 0;
14624 # Always break lists contained in '[' and '{' if too long for 1 line,
14625 # and always break lists which are too long and part of a more complex
14627 my $must_break_open_container = $must_break_open
14629 && ( $in_hierarchical_list || $opening_token ne '(' ) );
14631 #print "LISTX: next=$next_nonblank_type avail cols=$columns packed=$packed_columns must format = $must_break_open_container too-long=$too_long opening=$opening_token list_type=$list_type formatted_lines=$formatted_lines packed=$packed_lines max_sparsity= $max_allowed_sparsity sparsity=$sparsity \n";
14633 #---------------------------------------------------------------
14634 # The main decision:
14635 # Now decide if we will align the data into aligned columns. Do not
14636 # attempt to align columns if this is a tiny table or it would be
14637 # too spaced. It seems that the more packed lines we have, the
14638 # sparser the list that can be allowed and still look ok.
14639 #---------------------------------------------------------------
14641 if ( ( $formatted_lines < 3 && $packed_lines < $formatted_lines )
14642 || ( $formatted_lines < 2 )
14643 || ( $unused_columns > $max_allowed_sparsity * $formatted_columns )
14647 #---------------------------------------------------------------
14648 # too sparse: would look ugly if aligned in a table;
14649 #---------------------------------------------------------------
14651 # use old breakpoints if this is a 'big' list
14652 # FIXME: goal is to improve set_ragged_breakpoints so that
14653 # this is not necessary.
14654 if ( $packed_lines > 2 && $item_count > 10 ) {
14655 write_logfile_entry("List sparse: using old breakpoints\n");
14656 copy_old_breakpoints( $i_first_comma, $i_last_comma );
14659 # let the continuation logic handle it if 2 lines
14662 my $break_count = set_ragged_breakpoints( \@i_term_comma,
14663 $ri_ragged_break_list );
14664 ++$break_count if ($use_separate_first_term);
14666 unless ($must_break_open_container) {
14667 if ( $break_count <= 1 ) {
14668 $$rdo_not_break_apart = 1;
14670 elsif ( $rOpts_line_up_parentheses && !$need_lp_break_open )
14672 $$rdo_not_break_apart = 1;
14675 set_non_alignment_flags( $comma_count, $rcomma_index );
14680 #---------------------------------------------------------------
14681 # go ahead and format as a table
14682 #---------------------------------------------------------------
14683 write_logfile_entry(
14684 "List: auto formatting with $number_of_fields fields/row\n");
14686 my $j_first_break =
14687 $use_separate_first_term ? $number_of_fields : $number_of_fields - 1;
14690 my $j = $j_first_break ;
14691 $j < $comma_count ;
14692 $j += $number_of_fields
14695 my $i = $$rcomma_index[$j];
14696 set_forced_breakpoint($i);
14702 sub set_non_alignment_flags {
14704 # set flag which indicates that these commas should not be
14706 my ( $comma_count, $rcomma_index ) = @_;
14707 foreach ( 0 .. $comma_count - 1 ) {
14708 $matching_token_to_go[ $$rcomma_index[$_] ] = 1;
14712 sub study_list_complexity {
14714 # Look for complex tables which should be formatted with one term per line.
14715 # Returns the following:
14717 # \@i_ragged_break_list = list of good breakpoints to avoid lines
14718 # which are hard to read
14719 # $number_of_fields_best = suggested number of fields based on
14720 # complexity; = 0 if any number may be used.
14722 my ( $ri_term_begin, $ri_term_end, $ritem_lengths, $max_width ) = @_;
14723 my $item_count = @{$ri_term_begin};
14724 my $complex_item_count = 0;
14725 my $number_of_fields_best = $rOpts_maximum_fields_per_table;
14726 my $i_max = @{$ritem_lengths} - 1;
14727 ##my @item_complexity;
14729 my $i_last_last_break = -3;
14730 my $i_last_break = -2;
14731 my @i_ragged_break_list;
14733 my $definitely_complex = 30;
14734 my $definitely_simple = 12;
14735 my $quote_count = 0;
14737 for my $i ( 0 .. $i_max ) {
14738 my $ib = $ri_term_begin->[$i];
14739 my $ie = $ri_term_end->[$i];
14741 # define complexity: start with the actual term length
14742 my $weighted_length = ( $ritem_lengths->[$i] - 2 );
14744 ##TBD: join types here and check for variations
14745 ##my $str=join "", @tokens_to_go[$ib..$ie];
14748 if ( $types_to_go[$ib] =~ /^[qQ]$/ ) {
14752 elsif ( $types_to_go[$ib] =~ /^[w\-]$/ ) {
14756 if ( $ib eq $ie ) {
14757 if ( $is_quote && $tokens_to_go[$ib] =~ /\s/ ) {
14758 $complex_item_count++;
14759 $weighted_length *= 2;
14765 if ( grep { $_ eq 'b' } @types_to_go[ $ib .. $ie ] ) {
14766 $complex_item_count++;
14767 $weighted_length *= 2;
14769 if ( grep { $_ eq '..' } @types_to_go[ $ib .. $ie ] ) {
14770 $weighted_length += 4;
14774 # add weight for extra tokens.
14775 $weighted_length += 2 * ( $ie - $ib );
14777 ## my $BUB = join '', @tokens_to_go[$ib..$ie];
14778 ## print "# COMPLEXITY:$weighted_length $BUB\n";
14780 ##push @item_complexity, $weighted_length;
14782 # now mark a ragged break after this item it if it is 'long and
14784 if ( $weighted_length >= $definitely_complex ) {
14786 # if we broke after the previous term
14787 # then break before it too
14788 if ( $i_last_break == $i - 1
14790 && $i_last_last_break != $i - 2 )
14793 ## FIXME: don't strand a small term
14794 pop @i_ragged_break_list;
14795 push @i_ragged_break_list, $i - 2;
14796 push @i_ragged_break_list, $i - 1;
14799 push @i_ragged_break_list, $i;
14800 $i_last_last_break = $i_last_break;
14801 $i_last_break = $i;
14804 # don't break before a small last term -- it will
14805 # not look good on a line by itself.
14806 elsif ($i == $i_max
14807 && $i_last_break == $i - 1
14808 && $weighted_length <= $definitely_simple )
14810 pop @i_ragged_break_list;
14814 my $identifier_count = $i_max + 1 - $quote_count;
14816 # Need more tuning here..
14817 if ( $max_width > 12
14818 && $complex_item_count > $item_count / 2
14819 && $number_of_fields_best != 2 )
14821 $number_of_fields_best = 1;
14824 return ( $number_of_fields_best, \@i_ragged_break_list, $identifier_count );
14827 sub get_maximum_fields_wanted {
14829 # Not all tables look good with more than one field of items.
14830 # This routine looks at a table and decides if it should be
14831 # formatted with just one field or not.
14832 # This coding is still under development.
14833 my ($ritem_lengths) = @_;
14835 my $number_of_fields_best = 0;
14837 # For just a few items, we tentatively assume just 1 field.
14838 my $item_count = @{$ritem_lengths};
14839 if ( $item_count <= 5 ) {
14840 $number_of_fields_best = 1;
14843 # For larger tables, look at it both ways and see what looks best
14847 my @max_length = ( 0, 0 );
14848 my @last_length_2 = ( undef, undef );
14849 my @first_length_2 = ( undef, undef );
14850 my $last_length = undef;
14851 my $total_variation_1 = 0;
14852 my $total_variation_2 = 0;
14853 my @total_variation_2 = ( 0, 0 );
14854 for ( my $j = 0 ; $j < $item_count ; $j++ ) {
14856 $is_odd = 1 - $is_odd;
14857 my $length = $ritem_lengths->[$j];
14858 if ( $length > $max_length[$is_odd] ) {
14859 $max_length[$is_odd] = $length;
14862 if ( defined($last_length) ) {
14863 my $dl = abs( $length - $last_length );
14864 $total_variation_1 += $dl;
14866 $last_length = $length;
14868 my $ll = $last_length_2[$is_odd];
14869 if ( defined($ll) ) {
14870 my $dl = abs( $length - $ll );
14871 $total_variation_2[$is_odd] += $dl;
14874 $first_length_2[$is_odd] = $length;
14876 $last_length_2[$is_odd] = $length;
14878 $total_variation_2 = $total_variation_2[0] + $total_variation_2[1];
14880 my $factor = ( $item_count > 10 ) ? 1 : ( $item_count > 5 ) ? 0.75 : 0;
14881 unless ( $total_variation_2 < $factor * $total_variation_1 ) {
14882 $number_of_fields_best = 1;
14885 return ($number_of_fields_best);
14888 sub table_columns_available {
14889 my $i_first_comma = shift;
14891 $rOpts_maximum_line_length - leading_spaces_to_go($i_first_comma);
14893 # Patch: the vertical formatter does not line up lines whose lengths
14894 # exactly equal the available line length because of allowances
14895 # that must be made for side comments. Therefore, the number of
14896 # available columns is reduced by 1 character.
14901 sub maximum_number_of_fields {
14903 # how many fields will fit in the available space?
14904 my ( $columns, $odd_or_even, $max_width, $pair_width ) = @_;
14905 my $max_pairs = int( $columns / $pair_width );
14906 my $number_of_fields = $max_pairs * 2;
14907 if ( $odd_or_even == 1
14908 && $max_pairs * $pair_width + $max_width <= $columns )
14910 $number_of_fields++;
14912 return $number_of_fields;
14915 sub compactify_table {
14917 # given a table with a certain number of fields and a certain number
14918 # of lines, see if reducing the number of fields will make it look
14920 my ( $item_count, $number_of_fields, $formatted_lines, $odd_or_even ) = @_;
14921 if ( $number_of_fields >= $odd_or_even * 2 && $formatted_lines > 0 ) {
14925 $min_fields = $number_of_fields ;
14926 $min_fields >= $odd_or_even
14927 && $min_fields * $formatted_lines >= $item_count ;
14928 $min_fields -= $odd_or_even
14931 $number_of_fields = $min_fields;
14934 return $number_of_fields;
14937 sub set_ragged_breakpoints {
14939 # Set breakpoints in a list that cannot be formatted nicely as a
14941 my ( $ri_term_comma, $ri_ragged_break_list ) = @_;
14943 my $break_count = 0;
14944 foreach (@$ri_ragged_break_list) {
14945 my $j = $ri_term_comma->[$_];
14947 set_forced_breakpoint($j);
14951 return $break_count;
14954 sub copy_old_breakpoints {
14955 my ( $i_first_comma, $i_last_comma ) = @_;
14956 for my $i ( $i_first_comma .. $i_last_comma ) {
14957 if ( $old_breakpoint_to_go[$i] ) {
14958 set_forced_breakpoint($i);
14964 my ( $i, $j ) = @_;
14965 if ( $i >= 0 && $i <= $j && $j <= $max_index_to_go ) {
14967 FORMATTER_DEBUG_FLAG_NOBREAK && do {
14968 my ( $a, $b, $c ) = caller();
14970 "NOBREAK: forced_breakpoint $forced_breakpoint_count from $a $c with i=$i max=$max_index_to_go type=$types_to_go[$i]\n"
14974 @nobreak_to_go[ $i .. $j ] = (1) x ( $j - $i + 1 );
14977 # shouldn't happen; non-critical error
14979 FORMATTER_DEBUG_FLAG_NOBREAK && do {
14980 my ( $a, $b, $c ) = caller();
14982 "NOBREAK ERROR: from $a $c with i=$i j=$j max=$max_index_to_go\n"
14988 sub set_fake_breakpoint {
14990 # Just bump up the breakpoint count as a signal that there are breaks.
14991 # This is useful if we have breaks but may want to postpone deciding where
14993 $forced_breakpoint_count++;
14996 sub set_forced_breakpoint {
14999 return unless defined $i && $i >= 0;
15001 # when called with certain tokens, use bond strengths to decide
15002 # if we break before or after it
15003 my $token = $tokens_to_go[$i];
15005 if ( $token =~ /^([\=\.\,\:\?]|and|or|xor|&&|\|\|)$/ ) {
15006 if ( $want_break_before{$token} && $i >= 0 ) { $i-- }
15009 # breaks are forced before 'if' and 'unless'
15010 elsif ( $is_if_unless{$token} ) { $i-- }
15012 if ( $i >= 0 && $i <= $max_index_to_go ) {
15013 my $i_nonblank = ( $types_to_go[$i] ne 'b' ) ? $i : $i - 1;
15015 FORMATTER_DEBUG_FLAG_FORCE && do {
15016 my ( $a, $b, $c ) = caller();
15018 "FORCE forced_breakpoint $forced_breakpoint_count from $a $c with i=$i_nonblank max=$max_index_to_go tok=$tokens_to_go[$i_nonblank] type=$types_to_go[$i_nonblank] nobr=$nobreak_to_go[$i_nonblank]\n";
15021 if ( $i_nonblank >= 0 && $nobreak_to_go[$i_nonblank] == 0 ) {
15022 $forced_breakpoint_to_go[$i_nonblank] = 1;
15024 if ( $i_nonblank > $index_max_forced_break ) {
15025 $index_max_forced_break = $i_nonblank;
15027 $forced_breakpoint_count++;
15028 $forced_breakpoint_undo_stack[ $forced_breakpoint_undo_count++ ] =
15031 # if we break at an opening container..break at the closing
15032 if ( $tokens_to_go[$i_nonblank] =~ /^[\{\[\(\?]$/ ) {
15033 set_closing_breakpoint($i_nonblank);
15039 sub clear_breakpoint_undo_stack {
15040 $forced_breakpoint_undo_count = 0;
15043 sub undo_forced_breakpoint_stack {
15045 my $i_start = shift;
15046 if ( $i_start < 0 ) {
15048 my ( $a, $b, $c ) = caller();
15050 "Program Bug: undo_forced_breakpoint_stack from $a $c has i=$i_start "
15054 while ( $forced_breakpoint_undo_count > $i_start ) {
15056 $forced_breakpoint_undo_stack[ --$forced_breakpoint_undo_count ];
15057 if ( $i >= 0 && $i <= $max_index_to_go ) {
15058 $forced_breakpoint_to_go[$i] = 0;
15059 $forced_breakpoint_count--;
15061 FORMATTER_DEBUG_FLAG_UNDOBP && do {
15062 my ( $a, $b, $c ) = caller();
15064 "UNDOBP: undo forced_breakpoint i=$i $forced_breakpoint_undo_count from $a $c max=$max_index_to_go\n"
15069 # shouldn't happen, but not a critical error
15071 FORMATTER_DEBUG_FLAG_UNDOBP && do {
15072 my ( $a, $b, $c ) = caller();
15074 "Program Bug: undo_forced_breakpoint from $a $c has i=$i but max=$max_index_to_go"
15081 sub recombine_breakpoints {
15083 # sub set_continuation_breaks is very liberal in setting line breaks
15084 # for long lines, always setting breaks at good breakpoints, even
15085 # when that creates small lines. Occasionally small line fragments
15086 # are produced which would look better if they were combined.
15087 # That's the task of this routine, recombine_breakpoints.
15088 my ( $ri_first, $ri_last ) = @_;
15089 my $more_to_do = 1;
15091 # We keep looping over all of the lines of this batch
15092 # until there are no more possible recombinations
15093 my $nmax_last = @$ri_last;
15094 while ($more_to_do) {
15098 my $nmax = @$ri_last - 1;
15100 # safety check for infinite loop
15101 unless ( $nmax < $nmax_last ) {
15103 # shouldn't happen because splice below decreases nmax on each pass:
15104 # but i get paranoid sometimes
15105 die "Program bug-infinite loop in recombine breakpoints\n";
15107 $nmax_last = $nmax;
15109 my $previous_outdentable_closing_paren;
15110 my $leading_amp_count = 0;
15111 my $this_line_is_semicolon_terminated;
15113 # loop over all remaining lines in this batch
15114 for $n ( 1 .. $nmax ) {
15116 #----------------------------------------------------------
15117 # If we join the current pair of lines,
15118 # line $n-1 will become the left part of the joined line
15119 # line $n will become the right part of the joined line
15121 # Here are Indexes of the endpoint tokens of the two lines:
15123 # ---left---- | ---right---
15124 # $if $imid | $imidr $il
15126 # We want to decide if we should join tokens $imid to $imidr
15128 # We will apply a number of ad-hoc tests to see if joining
15129 # here will look ok. The code will just issue a 'next'
15130 # command if the join doesn't look good. If we get through
15131 # the gauntlet of tests, the lines will be recombined.
15132 #----------------------------------------------------------
15133 my $if = $$ri_first[ $n - 1 ];
15134 my $il = $$ri_last[$n];
15135 my $imid = $$ri_last[ $n - 1 ];
15136 my $imidr = $$ri_first[$n];
15138 #my $depth_increase=( $nesting_depth_to_go[$imidr] -
15139 # $nesting_depth_to_go[$if] );
15141 ##print "RECOMBINE: n=$n imid=$imid if=$if type=$types_to_go[$if] =$tokens_to_go[$if] next_type=$types_to_go[$imidr] next_tok=$tokens_to_go[$imidr]\n";
15143 # If line $n is the last line, we set some flags and
15144 # do any special checks for it
15145 if ( $n == $nmax ) {
15147 # a terminal '{' should stay where it is
15148 next if $types_to_go[$imidr] eq '{';
15150 # set flag if statement $n ends in ';'
15151 $this_line_is_semicolon_terminated = $types_to_go[$il] eq ';'
15153 # with possible side comment
15154 || ( $types_to_go[$il] eq '#'
15155 && $il - $imidr >= 2
15156 && $types_to_go[ $il - 2 ] eq ';'
15157 && $types_to_go[ $il - 1 ] eq 'b' );
15160 #----------------------------------------------------------
15161 # Section 1: examine token at $imid (right end of first line
15163 #----------------------------------------------------------
15165 # an isolated '}' may join with a ';' terminated segment
15166 if ( $types_to_go[$imid] eq '}' ) {
15168 # Check for cases where combining a semicolon terminated
15169 # statement with a previous isolated closing paren will
15170 # allow the combined line to be outdented. This is
15171 # generally a good move. For example, we can join up
15172 # the last two lines here:
15174 # $dev, $ino, $mode, $nlink, $uid, $gid, $rdev,
15175 # $size, $atime, $mtime, $ctime, $blksize, $blocks
15181 # $dev, $ino, $mode, $nlink, $uid, $gid, $rdev,
15182 # $size, $atime, $mtime, $ctime, $blksize, $blocks
15185 # which makes the parens line up.
15187 # Another example, from Joe Matarazzo, probably looks best
15188 # with the 'or' clause appended to the trailing paren:
15189 # $self->some_method(
15192 # ) or die "Some_method didn't work";
15194 $previous_outdentable_closing_paren =
15195 $this_line_is_semicolon_terminated # ends in ';'
15196 && $if == $imid # only one token on last line
15197 && $tokens_to_go[$imid] eq ')' # must be structural paren
15199 # only &&, ||, and : if no others seen
15200 # (but note: our count made below could be wrong
15201 # due to intervening comments)
15202 && ( $leading_amp_count == 0
15203 || $types_to_go[$imidr] !~ /^(:|\&\&|\|\|)$/ )
15205 # but leading colons probably line up with with a
15206 # previous colon or question (count could be wrong).
15207 && $types_to_go[$imidr] ne ':'
15209 # only one step in depth allowed. this line must not
15210 # begin with a ')' itself.
15211 && ( $nesting_depth_to_go[$imid] ==
15212 $nesting_depth_to_go[$il] + 1 );
15216 $previous_outdentable_closing_paren
15218 # handle '.' and '?' specially below
15219 || ( $types_to_go[$imidr] =~ /^[\.\?]$/ )
15223 # do not recombine lines with ending &&, ||, or :
15224 elsif ( $types_to_go[$imid] =~ /^(|:|\&\&|\|\|)$/ ) {
15225 next unless $want_break_before{ $types_to_go[$imid] };
15228 # for lines ending in a comma...
15229 elsif ( $types_to_go[$imid] eq ',' ) {
15231 # an isolated '},' may join with an identifier + ';'
15232 # this is useful for the class of a 'bless' statement (bless.t)
15233 if ( $types_to_go[$if] eq '}'
15234 && $types_to_go[$imidr] eq 'i' )
15237 unless ( ( $if == ( $imid - 1 ) )
15238 && ( $il == ( $imidr + 1 ) )
15239 && $this_line_is_semicolon_terminated );
15241 # override breakpoint
15242 $forced_breakpoint_to_go[$imid] = 0;
15245 # but otherwise, do not recombine unless this will leave
15248 next unless ( $n + 1 >= $nmax );
15253 elsif ( $types_to_go[$imid] eq '(' ) {
15255 # No longer doing this
15258 elsif ( $types_to_go[$imid] eq ')' ) {
15260 # No longer doing this
15263 # keep a terminal colon
15264 elsif ( $types_to_go[$imid] eq ':' ) {
15268 # keep a terminal for-semicolon
15269 elsif ( $types_to_go[$imid] eq 'f' ) {
15273 # if '=' at end of line ...
15274 elsif ( $is_assignment{ $types_to_go[$imid] } ) {
15276 my $is_short_quote =
15277 ( $types_to_go[$imidr] eq 'Q'
15279 && length( $tokens_to_go[$imidr] ) <
15280 $rOpts_short_concatenation_item_length );
15281 my $ifnmax = $$ri_first[$nmax];
15282 my $ifnp = ( $nmax > $n ) ? $$ri_first[ $n + 1 ] : $ifnmax;
15284 ( $types_to_go[$if] eq '?' && $types_to_go[$ifnp] eq ':' );
15286 # always join an isolated '=', a short quote, or if this
15287 # will put ?/: at start of adjacent lines
15289 && !$is_short_quote
15296 # unless we can reduce this to two lines
15299 # or three lines, the last with a leading semicolon
15300 || ( $nmax == $n + 2
15301 && $types_to_go[$ifnmax] eq ';' )
15303 # or the next line ends with a here doc
15304 || $types_to_go[$il] eq 'h'
15307 # do not recombine if the two lines might align well
15308 # this is a very approximate test for this
15309 && $types_to_go[$imidr] ne $types_to_go[$ifnp]
15312 # -lp users often prefer this:
15313 # my $title = function($env, $env, $sysarea,
15314 # "bubba Borrower Entry");
15315 # so we will recombine if -lp is used we have ending comma
15316 if ( !$rOpts_line_up_parentheses
15317 || $types_to_go[$il] ne ',' )
15320 # otherwise, scan the rhs line up to last token for
15321 # complexity. Note that we are not counting the last
15322 # token in case it is an opening paren.
15324 my $depth = $nesting_depth_to_go[$imidr];
15325 for ( my $i = $imidr + 1 ; $i < $il ; $i++ ) {
15326 if ( $nesting_depth_to_go[$i] != $depth ) {
15328 last if ( $tv > 1 );
15330 $depth = $nesting_depth_to_go[$i];
15333 # ok to recombine if no level changes before last token
15336 # otherwise, do not recombine if more than two
15338 next if ( $tv > 1 );
15340 # check total complexity of the two adjacent lines
15341 # that will occur if we do this join
15343 ( $n < $nmax ) ? $$ri_last[ $n + 1 ] : $il;
15344 for ( my $i = $il ; $i <= $istop ; $i++ ) {
15345 if ( $nesting_depth_to_go[$i] != $depth ) {
15347 last if ( $tv > 2 );
15349 $depth = $nesting_depth_to_go[$i];
15352 # do not recombine if total is more than 2 level changes
15353 next if ( $tv > 2 );
15358 unless ( $tokens_to_go[$imidr] =~ /^[\{\(\[]$/ ) {
15359 $forced_breakpoint_to_go[$imid] = 0;
15364 elsif ( $types_to_go[$imid] eq 'k' ) {
15366 # make major control keywords stand out
15371 #/^(last|next|redo|return)$/
15372 $is_last_next_redo_return{ $tokens_to_go[$imid] }
15374 # but only if followed by multiple lines
15378 if ( $is_and_or{ $tokens_to_go[$imid] } ) {
15379 next unless $want_break_before{ $tokens_to_go[$imid] };
15383 # handle trailing + - * /
15384 elsif ( $types_to_go[$imid] =~ /^[\+\-\*\/]$/ ) {
15385 my $i_next_nonblank = $imidr;
15386 my $i_next_next = $i_next_nonblank + 1;
15387 $i_next_next++ if ( $types_to_go[$i_next_next] eq 'b' );
15389 # do not strand numbers
15392 $types_to_go[$i_next_nonblank] eq 'n'
15394 $i_next_nonblank == $il
15395 || ( $i_next_next == $il
15396 && $types_to_go[$i_next_next] =~ /^[\+\-\*\/]$/ )
15397 || $types_to_go[$i_next_next] eq ';'
15402 #----------------------------------------------------------
15403 # Section 2: Now examine token at $imidr (left end of second
15405 #----------------------------------------------------------
15407 # join lines identified above as capable of
15408 # causing an outdented line with leading closing paren
15409 if ($previous_outdentable_closing_paren) {
15410 $forced_breakpoint_to_go[$imid] = 0;
15413 # do not recombine lines with leading &&, ||, or :
15414 elsif ( $types_to_go[$imidr] =~ /^(:|\&\&|\|\|)$/ ) {
15415 $leading_amp_count++;
15416 next if $want_break_before{ $types_to_go[$imidr] };
15419 # Identify and recombine a broken ?/: chain
15420 elsif ( $types_to_go[$imidr] eq '?' ) {
15422 # indexes of line first tokens --
15423 # mm - line before previous line
15424 # f - previous line
15427 # fff - line after next
15428 my $iff = $n < $nmax ? $$ri_first[ $n + 1 ] : -1;
15429 my $ifff = $n + 2 <= $nmax ? $$ri_first[ $n + 2 ] : -1;
15430 my $imm = $n > 1 ? $$ri_first[ $n - 2 ] : -1;
15431 my $seqno = $type_sequence_to_go[$imidr];
15433 ( $types_to_go[$if] eq ':'
15434 && $type_sequence_to_go[$if] ==
15435 $seqno - TYPE_SEQUENCE_INCREMENT );
15438 && $types_to_go[$imm] eq ':'
15439 && $type_sequence_to_go[$imm] ==
15440 $seqno - 2 * TYPE_SEQUENCE_INCREMENT );
15444 && $types_to_go[$iff] eq ':'
15445 && $type_sequence_to_go[$iff] == $seqno );
15448 && $types_to_go[$ifff] eq ':'
15449 && $type_sequence_to_go[$ifff] ==
15450 $seqno + TYPE_SEQUENCE_INCREMENT );
15452 # we require that this '?' be part of a correct sequence
15453 # of 3 in a row or else no recombination is done.
15455 unless ( ( $ff_ok || $mm_ok ) && ( $f_ok || $fff_ok ) );
15456 $forced_breakpoint_to_go[$imid] = 0;
15459 # do not recombine lines with leading '.'
15460 elsif ( $types_to_go[$imidr] =~ /^(\.)$/ ) {
15461 my $i_next_nonblank = $imidr + 1;
15462 if ( $types_to_go[$i_next_nonblank] eq 'b' ) {
15463 $i_next_nonblank++;
15469 # ... unless there is just one and we can reduce
15470 # this to two lines if we do. For example, this
15474 # '($dummy, $pat) = &get_next_tex_cmd;' . '$args .= $pat;'
15476 # looks better than this:
15477 # $bodyA .= '($dummy, $pat) = &get_next_tex_cmd;'
15478 # . '$args .= $pat;'
15483 && $types_to_go[$if] ne $types_to_go[$imidr]
15486 # ... or this would strand a short quote , like this
15487 # . "some long qoute"
15490 || ( $types_to_go[$i_next_nonblank] eq 'Q'
15491 && $i_next_nonblank >= $il - 1
15492 && length( $tokens_to_go[$i_next_nonblank] ) <
15493 $rOpts_short_concatenation_item_length )
15497 # handle leading keyword..
15498 elsif ( $types_to_go[$imidr] eq 'k' ) {
15500 # handle leading "and" and "or"
15501 if ( $is_and_or{ $tokens_to_go[$imidr] } ) {
15503 # Decide if we will combine a single terminal 'and' and
15504 # 'or' after an 'if' or 'unless'. We should consider the
15505 # possible vertical alignment, and visual clutter.
15507 # This looks best with the 'and' on the same
15508 # line as the 'if':
15511 # if $seconds and $nu < 2;
15513 # But this looks better as shown:
15516 # if !$this->{Parents}{$_}
15517 # or $this->{Parents}{$_} eq $_;
15519 # Eventually, it would be nice to look for
15520 # similarities (such as 'this' or 'Parents'), but
15521 # for now I'm using a simple rule that says that
15522 # the resulting line length must not be more than
15523 # half the maximum line length (making it 80/2 =
15524 # 40 characters by default).
15527 $this_line_is_semicolon_terminated
15530 # following 'if' or 'unless'
15531 $types_to_go[$if] eq 'k'
15532 && $is_if_unless{ $tokens_to_go[$if] }
15538 # handle leading "if" and "unless"
15539 elsif ( $is_if_unless{ $tokens_to_go[$imidr] } ) {
15541 # FIXME: This is still experimental..may not be too useful
15544 $this_line_is_semicolon_terminated
15546 # previous line begins with 'and' or 'or'
15547 && $types_to_go[$if] eq 'k'
15548 && $is_and_or{ $tokens_to_go[$if] }
15553 # handle all other leading keywords
15556 # keywords look best at start of lines,
15557 # but combine things like "1 while"
15558 unless ( $is_assignment{ $types_to_go[$imid] } ) {
15560 if ( ( $types_to_go[$imid] ne 'k' )
15561 && ( $tokens_to_go[$imidr] ne 'while' ) );
15566 # similar treatment of && and || as above for 'and' and 'or':
15567 # NOTE: This block of code is currently bypassed because
15568 # of a previous block but is retained for possible future use.
15569 elsif ( $types_to_go[$imidr] =~ /^(&&|\|\|)$/ ) {
15571 # maybe looking at something like:
15572 # unless $TEXTONLY || $item =~ m%</?(hr>|p>|a|img)%i;
15576 $this_line_is_semicolon_terminated
15578 # previous line begins with an 'if' or 'unless' keyword
15579 && $types_to_go[$if] eq 'k'
15580 && $is_if_unless{ $tokens_to_go[$if] }
15585 # handle leading + - * /
15586 elsif ( $types_to_go[$imidr] =~ /^[\+\-\*\/]$/ ) {
15587 my $i_next_nonblank = $imidr + 1;
15588 if ( $types_to_go[$i_next_nonblank] eq 'b' ) {
15589 $i_next_nonblank++;
15592 my $i_next_next = $i_next_nonblank + 1;
15593 $i_next_next++ if ( $types_to_go[$i_next_next] eq 'b' );
15598 # unless there is just one and we can reduce
15599 # this to two lines if we do. For example, this
15603 && $types_to_go[$if] ne $types_to_go[$imidr]
15606 # do not strand numbers
15608 $types_to_go[$i_next_nonblank] eq 'n'
15609 && ( $i_next_nonblank >= $il - 1
15610 || $types_to_go[$i_next_next] eq ';' )
15615 # handle line with leading = or similar
15616 elsif ( $is_assignment{ $types_to_go[$imidr] } ) {
15617 next unless $n == 1;
15618 my $ifnmax = $$ri_first[$nmax];
15622 # unless we can reduce this to two lines
15625 # or three lines, the last with a leading semicolon
15626 || ( $nmax == 3 && $types_to_go[$ifnmax] eq ';' )
15628 # or the next line ends with a here doc
15629 || $types_to_go[$il] eq 'h'
15633 #----------------------------------------------------------
15635 # Combine the lines if we arrive here and it is possible
15636 #----------------------------------------------------------
15638 # honor hard breakpoints
15639 next if ( $forced_breakpoint_to_go[$imid] > 0 );
15641 my $bs = $bond_strength_to_go[$imid];
15643 # combined line cannot be too long
15645 if excess_line_length( $if, $il ) > 0;
15647 # do not recombine if we would skip in indentation levels
15648 if ( $n < $nmax ) {
15649 my $if_next = $$ri_first[ $n + 1 ];
15652 $levels_to_go[$if] < $levels_to_go[$imidr]
15653 && $levels_to_go[$imidr] < $levels_to_go[$if_next]
15655 # but an isolated 'if (' is undesirable
15658 && $imid - $if <= 2
15659 && $types_to_go[$if] eq 'k'
15660 && $tokens_to_go[$if] eq 'if'
15661 && $tokens_to_go[$imid] ne '('
15667 next if ( $bs == NO_BREAK );
15669 # remember the pair with the greatest bond strength
15676 if ( $bs > $bs_best ) {
15681 # we have 2 or more candidates, so need another pass
15686 # recombine the pair with the greatest bond strength
15688 splice @$ri_first, $n_best, 1;
15689 splice @$ri_last, $n_best - 1, 1;
15692 return ( $ri_first, $ri_last );
15695 sub break_all_chain_tokens {
15697 # scan the current breakpoints looking for breaks at certain "chain
15698 # operators" (. : && || + etc) which often occur repeatedly in a long
15699 # statement. If we see a break at any one, break at all similar tokens
15700 # within the same container.
15703 # does not handle nested ?: operators correctly
15704 # coordinate better with ?: logic in set_continuation_breaks
15705 my ( $ri_left, $ri_right ) = @_;
15707 my %saw_chain_type;
15708 my %left_chain_type;
15709 my %right_chain_type;
15710 my %interior_chain_type;
15711 my $nmax = @$ri_right - 1;
15713 # scan the left and right end tokens of all lines
15715 for my $n ( 0 .. $nmax ) {
15716 my $il = $$ri_left[$n];
15717 my $ir = $$ri_right[$n];
15718 my $typel = $types_to_go[$il];
15719 my $typer = $types_to_go[$ir];
15720 $typel = '+' if ( $typel eq '-' ); # treat + and - the same
15721 $typer = '+' if ( $typer eq '-' );
15722 $typel = '*' if ( $typel eq '/' ); # treat * and / the same
15723 $typer = '*' if ( $typer eq '/' );
15724 my $tokenl = $tokens_to_go[$il];
15725 my $tokenr = $tokens_to_go[$ir];
15727 if ( $is_chain_operator{$tokenl} && $want_break_before{$typel} ) {
15728 next if ( $typel eq '?' );
15729 push @{ $left_chain_type{$typel} }, $il;
15730 $saw_chain_type{$typel} = 1;
15733 if ( $is_chain_operator{$tokenr} && !$want_break_before{$typer} ) {
15734 next if ( $typer eq '?' );
15735 push @{ $right_chain_type{$typer} }, $ir;
15736 $saw_chain_type{$typer} = 1;
15740 return unless $count;
15742 # now look for any interior tokens of the same types
15744 for my $n ( 0 .. $nmax ) {
15745 my $il = $$ri_left[$n];
15746 my $ir = $$ri_right[$n];
15747 for ( my $i = $il + 1 ; $i < $ir ; $i++ ) {
15748 my $type = $types_to_go[$i];
15749 $type = '+' if ( $type eq '-' );
15750 $type = '*' if ( $type eq '/' );
15751 if ( $saw_chain_type{$type} ) {
15752 push @{ $interior_chain_type{$type} }, $i;
15757 return unless $count;
15759 # now make a list of all new break points
15762 # loop over all chain types
15763 foreach my $type ( keys %saw_chain_type ) {
15765 # quit if just ONE continuation line with leading . For example--
15766 # print LATEXFILE '\framebox{\parbox[c][' . $h . '][t]{' . $w . '}{'
15768 last if ( $nmax == 1 && $type =~ /^[\.\+]$/ );
15770 # loop over all interior chain tokens
15771 foreach my $itest ( @{ $interior_chain_type{$type} } ) {
15773 # loop over all left end tokens of same type
15774 if ( $left_chain_type{$type} ) {
15775 next if $nobreak_to_go[ $itest - 1 ];
15776 foreach my $i ( @{ $left_chain_type{$type} } ) {
15777 next unless in_same_container( $i, $itest );
15778 push @insert_list, $itest - 1;
15783 # loop over all right end tokens of same type
15784 if ( $right_chain_type{$type} ) {
15785 next if $nobreak_to_go[$itest];
15786 foreach my $i ( @{ $right_chain_type{$type} } ) {
15787 next unless in_same_container( $i, $itest );
15788 push @insert_list, $itest;
15795 # insert any new break points
15796 if (@insert_list) {
15797 insert_additional_breaks( \@insert_list, $ri_left, $ri_right );
15801 sub in_same_container {
15803 # check to see if tokens at i1 and i2 are in the
15804 # same container, and not separated by a comma, ? or :
15805 my ( $i1, $i2 ) = @_;
15806 my $type = $types_to_go[$i1];
15807 my $depth = $nesting_depth_to_go[$i1];
15808 return unless ( $nesting_depth_to_go[$i2] == $depth );
15809 if ( $i2 < $i1 ) { ( $i1, $i2 ) = ( $i2, $i1 ) }
15810 for ( my $i = $i1 + 1 ; $i < $i2 ; $i++ ) {
15811 next if ( $nesting_depth_to_go[$i] > $depth );
15812 return if ( $nesting_depth_to_go[$i] < $depth );
15814 my $tok = $tokens_to_go[$i];
15815 $tok = ',' if $tok eq '=>'; # treat => same as ,
15817 # Example: we would not want to break at any of these .'s
15818 # : "<A HREF=\"#item_" . htmlify( 0, $s2 ) . "\">$str</A>"
15819 if ( $type ne ':' ) {
15820 return if ( $tok =~ /^[\,\:\?]$/ ) || $tok eq '||' || $tok eq 'or';
15823 return if ( $tok =~ /^[\,]$/ );
15829 sub set_continuation_breaks {
15831 # Define an array of indexes for inserting newline characters to
15832 # keep the line lengths below the maximum desired length. There is
15833 # an implied break after the last token, so it need not be included.
15836 # This routine is part of series of routines which adjust line
15837 # lengths. It is only called if a statement is longer than the
15838 # maximum line length, or if a preliminary scanning located
15839 # desirable break points. Sub scan_list has already looked at
15840 # these tokens and set breakpoints (in array
15841 # $forced_breakpoint_to_go[$i]) where it wants breaks (for example
15842 # after commas, after opening parens, and before closing parens).
15843 # This routine will honor these breakpoints and also add additional
15844 # breakpoints as necessary to keep the line length below the maximum
15845 # requested. It bases its decision on where the 'bond strength' is
15848 # Output: returns references to the arrays:
15851 # which contain the indexes $i of the first and last tokens on each
15854 # In addition, the array:
15855 # $forced_breakpoint_to_go[$i]
15856 # may be updated to be =1 for any index $i after which there must be
15857 # a break. This signals later routines not to undo the breakpoint.
15859 my $saw_good_break = shift;
15860 my @i_first = (); # the first index to output
15861 my @i_last = (); # the last index to output
15862 my @i_colon_breaks = (); # needed to decide if we have to break at ?'s
15863 if ( $types_to_go[0] eq ':' ) { push @i_colon_breaks, 0 }
15865 set_bond_strengths();
15868 my $imax = $max_index_to_go;
15869 if ( $types_to_go[$imin] eq 'b' ) { $imin++ }
15870 if ( $types_to_go[$imax] eq 'b' ) { $imax-- }
15871 my $i_begin = $imin; # index for starting next iteration
15873 my $leading_spaces = leading_spaces_to_go($imin);
15874 my $line_count = 0;
15875 my $last_break_strength = NO_BREAK;
15876 my $i_last_break = -1;
15877 my $max_bias = 0.001;
15878 my $tiny_bias = 0.0001;
15879 my $leading_alignment_token = "";
15880 my $leading_alignment_type = "";
15882 # see if any ?/:'s are in order
15883 my $colons_in_order = 1;
15885 my @colon_list = grep /^[\?\:]$/, @tokens_to_go[ 0 .. $max_index_to_go ];
15886 foreach (@colon_list) {
15887 if ( $_ eq $last_tok ) { $colons_in_order = 0; last }
15891 # This is a sufficient but not necessary condition for colon chain
15892 my $is_colon_chain = ( $colons_in_order && @colon_list > 2 );
15894 #-------------------------------------------------------
15895 # BEGINNING of main loop to set continuation breakpoints
15896 # Keep iterating until we reach the end
15897 #-------------------------------------------------------
15898 while ( $i_begin <= $imax ) {
15899 my $lowest_strength = NO_BREAK;
15900 my $starting_sum = $lengths_to_go[$i_begin];
15903 my $lowest_next_token = '';
15904 my $lowest_next_type = 'b';
15905 my $i_lowest_next_nonblank = -1;
15907 #-------------------------------------------------------
15908 # BEGINNING of inner loop to find the best next breakpoint
15909 #-------------------------------------------------------
15910 for ( $i_test = $i_begin ; $i_test <= $imax ; $i_test++ ) {
15911 my $type = $types_to_go[$i_test];
15912 my $token = $tokens_to_go[$i_test];
15913 my $next_type = $types_to_go[ $i_test + 1 ];
15914 my $next_token = $tokens_to_go[ $i_test + 1 ];
15915 my $i_next_nonblank =
15916 ( ( $next_type eq 'b' ) ? $i_test + 2 : $i_test + 1 );
15917 my $next_nonblank_type = $types_to_go[$i_next_nonblank];
15918 my $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
15919 my $next_nonblank_block_type = $block_type_to_go[$i_next_nonblank];
15920 my $strength = $bond_strength_to_go[$i_test];
15921 my $must_break = 0;
15923 # FIXME: TESTING: Might want to be able to break after these
15924 # force an immediate break at certain operators
15925 # with lower level than the start of the line
15928 $next_nonblank_type =~ /^(\.|\&\&|\|\|)$/
15929 || ( $next_nonblank_type eq 'k'
15930 && $next_nonblank_token =~ /^(and|or)$/ )
15932 && ( $nesting_depth_to_go[$i_begin] >
15933 $nesting_depth_to_go[$i_next_nonblank] )
15936 set_forced_breakpoint($i_next_nonblank);
15941 # Try to put a break where requested by scan_list
15942 $forced_breakpoint_to_go[$i_test]
15944 # break between ) { in a continued line so that the '{' can
15946 # See similar logic in scan_list which catches instances
15947 # where a line is just something like ') {'
15949 && ( $token eq ')' )
15950 && ( $next_nonblank_type eq '{' )
15951 && ($next_nonblank_block_type)
15952 && !$rOpts->{'opening-brace-always-on-right'} )
15954 # There is an implied forced break at a terminal opening brace
15955 || ( ( $type eq '{' ) && ( $i_test == $imax ) )
15959 # Forced breakpoints must sometimes be overridden, for example
15960 # because of a side comment causing a NO_BREAK. It is easier
15961 # to catch this here than when they are set.
15962 if ( $strength < NO_BREAK ) {
15963 $strength = $lowest_strength - $tiny_bias;
15968 # quit if a break here would put a good terminal token on
15969 # the next line and we already have a possible break
15972 && ( $next_nonblank_type =~ /^[\;\,]$/ )
15976 $lengths_to_go[ $i_next_nonblank + 1 ] -
15978 ) > $rOpts_maximum_line_length
15982 last if ( $i_lowest >= 0 );
15985 # Avoid a break which would strand a single punctuation
15986 # token. For example, we do not want to strand a leading
15987 # '.' which is followed by a long quoted string.
15990 && ( $i_test == $i_begin )
15991 && ( $i_test < $imax )
15992 && ( $token eq $type )
15996 $lengths_to_go[ $i_test + 1 ] -
15998 ) <= $rOpts_maximum_line_length
16004 if ( ( $i_test < $imax ) && ( $next_type eq 'b' ) ) {
16010 if ( ( $strength <= $lowest_strength ) && ( $strength < NO_BREAK ) )
16013 # break at previous best break if it would have produced
16014 # a leading alignment of certain common tokens, and it
16015 # is different from the latest candidate break
16017 if ($leading_alignment_type);
16019 # Force at least one breakpoint if old code had good
16020 # break It is only called if a breakpoint is required or
16021 # desired. This will probably need some adjustments
16022 # over time. A goal is to try to be sure that, if a new
16023 # side comment is introduced into formated text, then
16024 # the same breakpoints will occur. scbreak.t
16027 $i_test == $imax # we are at the end
16028 && !$forced_breakpoint_count #
16029 && $saw_good_break # old line had good break
16030 && $type =~ /^[#;\{]$/ # and this line ends in
16031 # ';' or side comment
16032 && $i_last_break < 0 # and we haven't made a break
16033 && $i_lowest > 0 # and we saw a possible break
16034 && $i_lowest < $imax - 1 # (but not just before this ;)
16035 && $strength - $lowest_strength < 0.5 * WEAK # and it's good
16038 $lowest_strength = $strength;
16039 $i_lowest = $i_test;
16040 $lowest_next_token = $next_nonblank_token;
16041 $lowest_next_type = $next_nonblank_type;
16042 $i_lowest_next_nonblank = $i_next_nonblank;
16043 last if $must_break;
16045 # set flags to remember if a break here will produce a
16046 # leading alignment of certain common tokens
16047 if ( $line_count > 0
16049 && ( $lowest_strength - $last_break_strength <= $max_bias )
16052 my $i_last_end = $i_begin - 1;
16053 if ( $types_to_go[$i_last_end] eq 'b' ) { $i_last_end -= 1 }
16054 my $tok_beg = $tokens_to_go[$i_begin];
16055 my $type_beg = $types_to_go[$i_begin];
16058 # check for leading alignment of certain tokens
16060 $tok_beg eq $next_nonblank_token
16061 && $is_chain_operator{$tok_beg}
16062 && ( $type_beg eq 'k'
16063 || $type_beg eq $tok_beg )
16064 && $nesting_depth_to_go[$i_begin] >=
16065 $nesting_depth_to_go[$i_next_nonblank]
16068 || ( $tokens_to_go[$i_last_end] eq $token
16069 && $is_chain_operator{$token}
16070 && ( $type eq 'k' || $type eq $token )
16071 && $nesting_depth_to_go[$i_last_end] >=
16072 $nesting_depth_to_go[$i_test] )
16075 $leading_alignment_token = $next_nonblank_token;
16076 $leading_alignment_type = $next_nonblank_type;
16082 ( $i_test >= $imax )
16087 $lengths_to_go[ $i_test + 2 ] -
16089 ) > $rOpts_maximum_line_length
16092 FORMATTER_DEBUG_FLAG_BREAK
16094 "BREAK: testing i = $i_test imax=$imax $types_to_go[$i_test] $next_nonblank_type leading sp=($leading_spaces) next length = $lengths_to_go[$i_test+2] too_long=$too_long str=$strength\n";
16096 # allow one extra terminal token after exceeding line length
16097 # if it would strand this token.
16098 if ( $rOpts_fuzzy_line_length
16100 && ( $i_lowest == $i_test )
16101 && ( length($token) > 1 )
16102 && ( $next_nonblank_type =~ /^[\;\,]$/ ) )
16109 ( $i_test == $imax ) # we're done if no more tokens,
16111 ( $i_lowest >= 0 ) # or no more space and we have a break
16117 #-------------------------------------------------------
16118 # END of inner loop to find the best next breakpoint
16119 # Now decide exactly where to put the breakpoint
16120 #-------------------------------------------------------
16122 # it's always ok to break at imax if no other break was found
16123 if ( $i_lowest < 0 ) { $i_lowest = $imax }
16125 # semi-final index calculation
16126 my $i_next_nonblank = (
16127 ( $types_to_go[ $i_lowest + 1 ] eq 'b' )
16131 my $next_nonblank_type = $types_to_go[$i_next_nonblank];
16132 my $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
16134 #-------------------------------------------------------
16135 # ?/: rule 1 : if a break here will separate a '?' on this
16136 # line from its closing ':', then break at the '?' instead.
16137 #-------------------------------------------------------
16139 foreach $i ( $i_begin + 1 .. $i_lowest - 1 ) {
16140 next unless ( $tokens_to_go[$i] eq '?' );
16142 # do not break if probable sequence of ?/: statements
16143 next if ($is_colon_chain);
16145 # do not break if statement is broken by side comment
16148 $tokens_to_go[$max_index_to_go] eq '#'
16149 && terminal_type( \@types_to_go, \@block_type_to_go, 0,
16150 $max_index_to_go ) !~ /^[\;\}]$/
16153 # no break needed if matching : is also on the line
16155 if ( $mate_index_to_go[$i] >= 0
16156 && $mate_index_to_go[$i] <= $i_next_nonblank );
16159 if ( $want_break_before{'?'} ) { $i_lowest-- }
16163 #-------------------------------------------------------
16164 # END of inner loop to find the best next breakpoint:
16165 # Break the line after the token with index i=$i_lowest
16166 #-------------------------------------------------------
16168 # final index calculation
16169 $i_next_nonblank = (
16170 ( $types_to_go[ $i_lowest + 1 ] eq 'b' )
16174 $next_nonblank_type = $types_to_go[$i_next_nonblank];
16175 $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
16177 FORMATTER_DEBUG_FLAG_BREAK
16178 && print "BREAK: best is i = $i_lowest strength = $lowest_strength\n";
16180 #-------------------------------------------------------
16181 # ?/: rule 2 : if we break at a '?', then break at its ':'
16183 # Note: this rule is also in sub scan_list to handle a break
16184 # at the start and end of a line (in case breaks are dictated
16185 # by side comments).
16186 #-------------------------------------------------------
16187 if ( $next_nonblank_type eq '?' ) {
16188 set_closing_breakpoint($i_next_nonblank);
16190 elsif ( $types_to_go[$i_lowest] eq '?' ) {
16191 set_closing_breakpoint($i_lowest);
16194 #-------------------------------------------------------
16195 # ?/: rule 3 : if we break at a ':' then we save
16196 # its location for further work below. We may need to go
16197 # back and break at its '?'.
16198 #-------------------------------------------------------
16199 if ( $next_nonblank_type eq ':' ) {
16200 push @i_colon_breaks, $i_next_nonblank;
16202 elsif ( $types_to_go[$i_lowest] eq ':' ) {
16203 push @i_colon_breaks, $i_lowest;
16206 # here we should set breaks for all '?'/':' pairs which are
16207 # separated by this line
16211 # save this line segment, after trimming blanks at the ends
16213 ( $types_to_go[$i_begin] eq 'b' ) ? $i_begin + 1 : $i_begin );
16215 ( $types_to_go[$i_lowest] eq 'b' ) ? $i_lowest - 1 : $i_lowest );
16217 # set a forced breakpoint at a container opening, if necessary, to
16218 # signal a break at a closing container. Excepting '(' for now.
16219 if ( $tokens_to_go[$i_lowest] =~ /^[\{\[]$/
16220 && !$forced_breakpoint_to_go[$i_lowest] )
16222 set_closing_breakpoint($i_lowest);
16225 # get ready to go again
16226 $i_begin = $i_lowest + 1;
16227 $last_break_strength = $lowest_strength;
16228 $i_last_break = $i_lowest;
16229 $leading_alignment_token = "";
16230 $leading_alignment_type = "";
16231 $lowest_next_token = '';
16232 $lowest_next_type = 'b';
16234 if ( ( $i_begin <= $imax ) && ( $types_to_go[$i_begin] eq 'b' ) ) {
16238 # update indentation size
16239 if ( $i_begin <= $imax ) {
16240 $leading_spaces = leading_spaces_to_go($i_begin);
16244 #-------------------------------------------------------
16245 # END of main loop to set continuation breakpoints
16246 # Now go back and make any necessary corrections
16247 #-------------------------------------------------------
16249 #-------------------------------------------------------
16250 # ?/: rule 4 -- if we broke at a ':', then break at
16251 # corresponding '?' unless this is a chain of ?: expressions
16252 #-------------------------------------------------------
16253 if (@i_colon_breaks) {
16255 # using a simple method for deciding if we are in a ?/: chain --
16256 # this is a chain if it has multiple ?/: pairs all in order;
16258 # Note that if line starts in a ':' we count that above as a break
16259 my $is_chain = ( $colons_in_order && @i_colon_breaks > 1 );
16261 unless ($is_chain) {
16262 my @insert_list = ();
16263 foreach (@i_colon_breaks) {
16264 my $i_question = $mate_index_to_go[$_];
16265 if ( $i_question >= 0 ) {
16266 if ( $want_break_before{'?'} ) {
16268 if ( $i_question > 0
16269 && $types_to_go[$i_question] eq 'b' )
16275 if ( $i_question >= 0 ) {
16276 push @insert_list, $i_question;
16279 insert_additional_breaks( \@insert_list, \@i_first, \@i_last );
16283 return \@i_first, \@i_last;
16286 sub insert_additional_breaks {
16288 # this routine will add line breaks at requested locations after
16289 # sub set_continuation_breaks has made preliminary breaks.
16291 my ( $ri_break_list, $ri_first, $ri_last ) = @_;
16294 my $line_number = 0;
16296 foreach $i_break_left ( sort { $a <=> $b } @$ri_break_list ) {
16298 $i_f = $$ri_first[$line_number];
16299 $i_l = $$ri_last[$line_number];
16300 while ( $i_break_left >= $i_l ) {
16303 # shouldn't happen unless caller passes bad indexes
16304 if ( $line_number >= @$ri_last ) {
16306 "Non-fatal program bug: couldn't set break at $i_break_left\n"
16308 report_definite_bug();
16311 $i_f = $$ri_first[$line_number];
16312 $i_l = $$ri_last[$line_number];
16315 my $i_break_right = $i_break_left + 1;
16316 if ( $types_to_go[$i_break_right] eq 'b' ) { $i_break_right++ }
16318 if ( $i_break_left >= $i_f
16319 && $i_break_left < $i_l
16320 && $i_break_right > $i_f
16321 && $i_break_right <= $i_l )
16323 splice( @$ri_first, $line_number, 1, ( $i_f, $i_break_right ) );
16324 splice( @$ri_last, $line_number, 1, ( $i_break_left, $i_l ) );
16329 sub set_closing_breakpoint {
16331 # set a breakpoint at a matching closing token
16332 # at present, this is only used to break at a ':' which matches a '?'
16333 my $i_break = shift;
16335 if ( $mate_index_to_go[$i_break] >= 0 ) {
16337 # CAUTION: infinite recursion possible here:
16338 # set_closing_breakpoint calls set_forced_breakpoint, and
16339 # set_forced_breakpoint call set_closing_breakpoint
16340 # ( test files attrib.t, BasicLyx.pm.html).
16341 # Don't reduce the '2' in the statement below
16342 if ( $mate_index_to_go[$i_break] > $i_break + 2 ) {
16344 # break before } ] and ), but sub set_forced_breakpoint will decide
16345 # to break before or after a ? and :
16346 my $inc = ( $tokens_to_go[$i_break] eq '?' ) ? 0 : 1;
16347 set_forced_breakpoint( $mate_index_to_go[$i_break] - $inc );
16351 my $type_sequence = $type_sequence_to_go[$i_break];
16352 if ($type_sequence) {
16353 my $closing_token = $matching_token{ $tokens_to_go[$i_break] };
16354 $postponed_breakpoint{$type_sequence} = 1;
16359 # check to see if output line tabbing agrees with input line
16360 # this can be very useful for debugging a script which has an extra
16362 sub compare_indentation_levels {
16364 my ( $python_indentation_level, $structural_indentation_level ) = @_;
16365 if ( ( $python_indentation_level ne $structural_indentation_level ) ) {
16366 $last_tabbing_disagreement = $input_line_number;
16368 if ($in_tabbing_disagreement) {
16371 $tabbing_disagreement_count++;
16373 if ( $tabbing_disagreement_count <= MAX_NAG_MESSAGES ) {
16374 write_logfile_entry(
16375 "Start indentation disagreement: input=$python_indentation_level; output=$structural_indentation_level\n"
16378 $in_tabbing_disagreement = $input_line_number;
16379 $first_tabbing_disagreement = $in_tabbing_disagreement
16380 unless ($first_tabbing_disagreement);
16385 if ($in_tabbing_disagreement) {
16387 if ( $tabbing_disagreement_count <= MAX_NAG_MESSAGES ) {
16388 write_logfile_entry(
16389 "End indentation disagreement from input line $in_tabbing_disagreement\n"
16392 if ( $tabbing_disagreement_count == MAX_NAG_MESSAGES ) {
16393 write_logfile_entry(
16394 "No further tabbing disagreements will be noted\n");
16397 $in_tabbing_disagreement = 0;
16402 #####################################################################
16404 # the Perl::Tidy::IndentationItem class supplies items which contain
16405 # how much whitespace should be used at the start of a line
16407 #####################################################################
16409 package Perl::Tidy::IndentationItem;
16411 # Indexes for indentation items
16412 use constant SPACES => 0; # total leading white spaces
16413 use constant LEVEL => 1; # the indentation 'level'
16414 use constant CI_LEVEL => 2; # the 'continuation level'
16415 use constant AVAILABLE_SPACES => 3; # how many left spaces available
16417 use constant CLOSED => 4; # index where we saw closing '}'
16418 use constant COMMA_COUNT => 5; # how many commas at this level?
16419 use constant SEQUENCE_NUMBER => 6; # output batch number
16420 use constant INDEX => 7; # index in output batch list
16421 use constant HAVE_CHILD => 8; # any dependents?
16422 use constant RECOVERABLE_SPACES => 9; # how many spaces to the right
16423 # we would like to move to get
16424 # alignment (negative if left)
16425 use constant ALIGN_PAREN => 10; # do we want to try to align
16426 # with an opening structure?
16427 use constant MARKED => 11; # if visited by corrector logic
16428 use constant STACK_DEPTH => 12; # indentation nesting depth
16429 use constant STARTING_INDEX => 13; # first token index of this level
16430 use constant ARROW_COUNT => 14; # how many =>'s
16434 # Create an 'indentation_item' which describes one level of leading
16435 # whitespace when the '-lp' indentation is used. We return
16436 # a reference to an anonymous array of associated variables.
16437 # See above constants for storage scheme.
16439 $class, $spaces, $level,
16440 $ci_level, $available_spaces, $index,
16441 $gnu_sequence_number, $align_paren, $stack_depth,
16445 my $arrow_count = 0;
16446 my $comma_count = 0;
16447 my $have_child = 0;
16448 my $want_right_spaces = 0;
16451 $spaces, $level, $ci_level,
16452 $available_spaces, $closed, $comma_count,
16453 $gnu_sequence_number, $index, $have_child,
16454 $want_right_spaces, $align_paren, $marked,
16455 $stack_depth, $starting_index, $arrow_count,
16459 sub permanently_decrease_AVAILABLE_SPACES {
16461 # make a permanent reduction in the available indentation spaces
16462 # at one indentation item. NOTE: if there are child nodes, their
16463 # total SPACES must be reduced by the caller.
16465 my ( $item, $spaces_needed ) = @_;
16466 my $available_spaces = $item->get_AVAILABLE_SPACES();
16467 my $deleted_spaces =
16468 ( $available_spaces > $spaces_needed )
16470 : $available_spaces;
16471 $item->decrease_AVAILABLE_SPACES($deleted_spaces);
16472 $item->decrease_SPACES($deleted_spaces);
16473 $item->set_RECOVERABLE_SPACES(0);
16475 return $deleted_spaces;
16478 sub tentatively_decrease_AVAILABLE_SPACES {
16480 # We are asked to tentatively delete $spaces_needed of indentation
16481 # for a indentation item. We may want to undo this later. NOTE: if
16482 # there are child nodes, their total SPACES must be reduced by the
16484 my ( $item, $spaces_needed ) = @_;
16485 my $available_spaces = $item->get_AVAILABLE_SPACES();
16486 my $deleted_spaces =
16487 ( $available_spaces > $spaces_needed )
16489 : $available_spaces;
16490 $item->decrease_AVAILABLE_SPACES($deleted_spaces);
16491 $item->decrease_SPACES($deleted_spaces);
16492 $item->increase_RECOVERABLE_SPACES($deleted_spaces);
16493 return $deleted_spaces;
16496 sub get_STACK_DEPTH {
16498 return $self->[STACK_DEPTH];
16503 return $self->[SPACES];
16508 return $self->[MARKED];
16512 my ( $self, $value ) = @_;
16513 if ( defined($value) ) {
16514 $self->[MARKED] = $value;
16516 return $self->[MARKED];
16519 sub get_AVAILABLE_SPACES {
16521 return $self->[AVAILABLE_SPACES];
16524 sub decrease_SPACES {
16525 my ( $self, $value ) = @_;
16526 if ( defined($value) ) {
16527 $self->[SPACES] -= $value;
16529 return $self->[SPACES];
16532 sub decrease_AVAILABLE_SPACES {
16533 my ( $self, $value ) = @_;
16534 if ( defined($value) ) {
16535 $self->[AVAILABLE_SPACES] -= $value;
16537 return $self->[AVAILABLE_SPACES];
16540 sub get_ALIGN_PAREN {
16542 return $self->[ALIGN_PAREN];
16545 sub get_RECOVERABLE_SPACES {
16547 return $self->[RECOVERABLE_SPACES];
16550 sub set_RECOVERABLE_SPACES {
16551 my ( $self, $value ) = @_;
16552 if ( defined($value) ) {
16553 $self->[RECOVERABLE_SPACES] = $value;
16555 return $self->[RECOVERABLE_SPACES];
16558 sub increase_RECOVERABLE_SPACES {
16559 my ( $self, $value ) = @_;
16560 if ( defined($value) ) {
16561 $self->[RECOVERABLE_SPACES] += $value;
16563 return $self->[RECOVERABLE_SPACES];
16568 return $self->[CI_LEVEL];
16573 return $self->[LEVEL];
16576 sub get_SEQUENCE_NUMBER {
16578 return $self->[SEQUENCE_NUMBER];
16583 return $self->[INDEX];
16586 sub get_STARTING_INDEX {
16588 return $self->[STARTING_INDEX];
16591 sub set_HAVE_CHILD {
16592 my ( $self, $value ) = @_;
16593 if ( defined($value) ) {
16594 $self->[HAVE_CHILD] = $value;
16596 return $self->[HAVE_CHILD];
16599 sub get_HAVE_CHILD {
16601 return $self->[HAVE_CHILD];
16604 sub set_ARROW_COUNT {
16605 my ( $self, $value ) = @_;
16606 if ( defined($value) ) {
16607 $self->[ARROW_COUNT] = $value;
16609 return $self->[ARROW_COUNT];
16612 sub get_ARROW_COUNT {
16614 return $self->[ARROW_COUNT];
16617 sub set_COMMA_COUNT {
16618 my ( $self, $value ) = @_;
16619 if ( defined($value) ) {
16620 $self->[COMMA_COUNT] = $value;
16622 return $self->[COMMA_COUNT];
16625 sub get_COMMA_COUNT {
16627 return $self->[COMMA_COUNT];
16631 my ( $self, $value ) = @_;
16632 if ( defined($value) ) {
16633 $self->[CLOSED] = $value;
16635 return $self->[CLOSED];
16640 return $self->[CLOSED];
16643 #####################################################################
16645 # the Perl::Tidy::VerticalAligner::Line class supplies an object to
16646 # contain a single output line
16648 #####################################################################
16650 package Perl::Tidy::VerticalAligner::Line;
16657 use constant JMAX => 0;
16658 use constant JMAX_ORIGINAL_LINE => 1;
16659 use constant RTOKENS => 2;
16660 use constant RFIELDS => 3;
16661 use constant RPATTERNS => 4;
16662 use constant INDENTATION => 5;
16663 use constant LEADING_SPACE_COUNT => 6;
16664 use constant OUTDENT_LONG_LINES => 7;
16665 use constant LIST_TYPE => 8;
16666 use constant IS_HANGING_SIDE_COMMENT => 9;
16667 use constant RALIGNMENTS => 10;
16668 use constant MAXIMUM_LINE_LENGTH => 11;
16669 use constant RVERTICAL_TIGHTNESS_FLAGS => 12;
16672 $_index_map{jmax} = JMAX;
16673 $_index_map{jmax_original_line} = JMAX_ORIGINAL_LINE;
16674 $_index_map{rtokens} = RTOKENS;
16675 $_index_map{rfields} = RFIELDS;
16676 $_index_map{rpatterns} = RPATTERNS;
16677 $_index_map{indentation} = INDENTATION;
16678 $_index_map{leading_space_count} = LEADING_SPACE_COUNT;
16679 $_index_map{outdent_long_lines} = OUTDENT_LONG_LINES;
16680 $_index_map{list_type} = LIST_TYPE;
16681 $_index_map{is_hanging_side_comment} = IS_HANGING_SIDE_COMMENT;
16682 $_index_map{ralignments} = RALIGNMENTS;
16683 $_index_map{maximum_line_length} = MAXIMUM_LINE_LENGTH;
16684 $_index_map{rvertical_tightness_flags} = RVERTICAL_TIGHTNESS_FLAGS;
16686 my @_default_data = ();
16687 $_default_data[JMAX] = undef;
16688 $_default_data[JMAX_ORIGINAL_LINE] = undef;
16689 $_default_data[RTOKENS] = undef;
16690 $_default_data[RFIELDS] = undef;
16691 $_default_data[RPATTERNS] = undef;
16692 $_default_data[INDENTATION] = undef;
16693 $_default_data[LEADING_SPACE_COUNT] = undef;
16694 $_default_data[OUTDENT_LONG_LINES] = undef;
16695 $_default_data[LIST_TYPE] = undef;
16696 $_default_data[IS_HANGING_SIDE_COMMENT] = undef;
16697 $_default_data[RALIGNMENTS] = [];
16698 $_default_data[MAXIMUM_LINE_LENGTH] = undef;
16699 $_default_data[RVERTICAL_TIGHTNESS_FLAGS] = undef;
16703 # methods to count object population
16705 sub get_count { $_count; }
16706 sub _increment_count { ++$_count }
16707 sub _decrement_count { --$_count }
16710 # Constructor may be called as a class method
16712 my ( $caller, %arg ) = @_;
16713 my $caller_is_obj = ref($caller);
16714 my $class = $caller_is_obj || $caller;
16716 my $self = bless [], $class;
16718 $self->[RALIGNMENTS] = [];
16721 foreach ( keys %_index_map ) {
16722 $index = $_index_map{$_};
16723 if ( exists $arg{$_} ) { $self->[$index] = $arg{$_} }
16724 elsif ($caller_is_obj) { $self->[$index] = $caller->[$index] }
16725 else { $self->[$index] = $_default_data[$index] }
16728 $self->_increment_count();
16733 $_[0]->_decrement_count();
16736 sub get_jmax { $_[0]->[JMAX] }
16737 sub get_jmax_original_line { $_[0]->[JMAX_ORIGINAL_LINE] }
16738 sub get_rtokens { $_[0]->[RTOKENS] }
16739 sub get_rfields { $_[0]->[RFIELDS] }
16740 sub get_rpatterns { $_[0]->[RPATTERNS] }
16741 sub get_indentation { $_[0]->[INDENTATION] }
16742 sub get_leading_space_count { $_[0]->[LEADING_SPACE_COUNT] }
16743 sub get_outdent_long_lines { $_[0]->[OUTDENT_LONG_LINES] }
16744 sub get_list_type { $_[0]->[LIST_TYPE] }
16745 sub get_is_hanging_side_comment { $_[0]->[IS_HANGING_SIDE_COMMENT] }
16746 sub get_rvertical_tightness_flags { $_[0]->[RVERTICAL_TIGHTNESS_FLAGS] }
16748 sub set_column { $_[0]->[RALIGNMENTS]->[ $_[1] ]->set_column( $_[2] ) }
16749 sub get_alignment { $_[0]->[RALIGNMENTS]->[ $_[1] ] }
16750 sub get_alignments { @{ $_[0]->[RALIGNMENTS] } }
16751 sub get_column { $_[0]->[RALIGNMENTS]->[ $_[1] ]->get_column() }
16753 sub get_starting_column {
16754 $_[0]->[RALIGNMENTS]->[ $_[1] ]->get_starting_column();
16757 sub increment_column {
16758 $_[0]->[RALIGNMENTS]->[ $_[1] ]->increment_column( $_[2] );
16760 sub set_alignments { my $self = shift; @{ $self->[RALIGNMENTS] } = @_; }
16762 sub current_field_width {
16766 return $self->get_column($j);
16769 return $self->get_column($j) - $self->get_column( $j - 1 );
16773 sub field_width_growth {
16776 return $self->get_column($j) - $self->get_starting_column($j);
16779 sub starting_field_width {
16783 return $self->get_starting_column($j);
16786 return $self->get_starting_column($j) -
16787 $self->get_starting_column( $j - 1 );
16791 sub increase_field_width {
16794 my ( $j, $pad ) = @_;
16795 my $jmax = $self->get_jmax();
16796 for my $k ( $j .. $jmax ) {
16797 $self->increment_column( $k, $pad );
16801 sub get_available_space_on_right {
16803 my $jmax = $self->get_jmax();
16804 return $self->[MAXIMUM_LINE_LENGTH] - $self->get_column($jmax);
16807 sub set_jmax { $_[0]->[JMAX] = $_[1] }
16808 sub set_jmax_original_line { $_[0]->[JMAX_ORIGINAL_LINE] = $_[1] }
16809 sub set_rtokens { $_[0]->[RTOKENS] = $_[1] }
16810 sub set_rfields { $_[0]->[RFIELDS] = $_[1] }
16811 sub set_rpatterns { $_[0]->[RPATTERNS] = $_[1] }
16812 sub set_indentation { $_[0]->[INDENTATION] = $_[1] }
16813 sub set_leading_space_count { $_[0]->[LEADING_SPACE_COUNT] = $_[1] }
16814 sub set_outdent_long_lines { $_[0]->[OUTDENT_LONG_LINES] = $_[1] }
16815 sub set_list_type { $_[0]->[LIST_TYPE] = $_[1] }
16816 sub set_is_hanging_side_comment { $_[0]->[IS_HANGING_SIDE_COMMENT] = $_[1] }
16817 sub set_alignment { $_[0]->[RALIGNMENTS]->[ $_[1] ] = $_[2] }
16821 #####################################################################
16823 # the Perl::Tidy::VerticalAligner::Alignment class holds information
16824 # on a single column being aligned
16826 #####################################################################
16827 package Perl::Tidy::VerticalAligner::Alignment;
16835 # Symbolic array indexes
16836 use constant COLUMN => 0; # the current column number
16837 use constant STARTING_COLUMN => 1; # column number when created
16838 use constant MATCHING_TOKEN => 2; # what token we are matching
16839 use constant STARTING_LINE => 3; # the line index of creation
16840 use constant ENDING_LINE => 4; # the most recent line to use it
16841 use constant SAVED_COLUMN => 5; # the most recent line to use it
16842 use constant SERIAL_NUMBER => 6; # unique number for this alignment
16843 # (just its index in an array)
16845 # Correspondence between variables and array indexes
16847 $_index_map{column} = COLUMN;
16848 $_index_map{starting_column} = STARTING_COLUMN;
16849 $_index_map{matching_token} = MATCHING_TOKEN;
16850 $_index_map{starting_line} = STARTING_LINE;
16851 $_index_map{ending_line} = ENDING_LINE;
16852 $_index_map{saved_column} = SAVED_COLUMN;
16853 $_index_map{serial_number} = SERIAL_NUMBER;
16855 my @_default_data = ();
16856 $_default_data[COLUMN] = undef;
16857 $_default_data[STARTING_COLUMN] = undef;
16858 $_default_data[MATCHING_TOKEN] = undef;
16859 $_default_data[STARTING_LINE] = undef;
16860 $_default_data[ENDING_LINE] = undef;
16861 $_default_data[SAVED_COLUMN] = undef;
16862 $_default_data[SERIAL_NUMBER] = undef;
16864 # class population count
16867 sub get_count { $_count; }
16868 sub _increment_count { ++$_count }
16869 sub _decrement_count { --$_count }
16874 my ( $caller, %arg ) = @_;
16875 my $caller_is_obj = ref($caller);
16876 my $class = $caller_is_obj || $caller;
16878 my $self = bless [], $class;
16880 foreach ( keys %_index_map ) {
16881 my $index = $_index_map{$_};
16882 if ( exists $arg{$_} ) { $self->[$index] = $arg{$_} }
16883 elsif ($caller_is_obj) { $self->[$index] = $caller->[$index] }
16884 else { $self->[$index] = $_default_data[$index] }
16886 $self->_increment_count();
16891 $_[0]->_decrement_count();
16894 sub get_column { return $_[0]->[COLUMN] }
16895 sub get_starting_column { return $_[0]->[STARTING_COLUMN] }
16896 sub get_matching_token { return $_[0]->[MATCHING_TOKEN] }
16897 sub get_starting_line { return $_[0]->[STARTING_LINE] }
16898 sub get_ending_line { return $_[0]->[ENDING_LINE] }
16899 sub get_serial_number { return $_[0]->[SERIAL_NUMBER] }
16901 sub set_column { $_[0]->[COLUMN] = $_[1] }
16902 sub set_starting_column { $_[0]->[STARTING_COLUMN] = $_[1] }
16903 sub set_matching_token { $_[0]->[MATCHING_TOKEN] = $_[1] }
16904 sub set_starting_line { $_[0]->[STARTING_LINE] = $_[1] }
16905 sub set_ending_line { $_[0]->[ENDING_LINE] = $_[1] }
16906 sub increment_column { $_[0]->[COLUMN] += $_[1] }
16908 sub save_column { $_[0]->[SAVED_COLUMN] = $_[0]->[COLUMN] }
16909 sub restore_column { $_[0]->[COLUMN] = $_[0]->[SAVED_COLUMN] }
16913 package Perl::Tidy::VerticalAligner;
16915 # The Perl::Tidy::VerticalAligner package collects output lines and
16916 # attempts to line up certain common tokens, such as => and #, which are
16917 # identified by the calling routine.
16919 # There are two main routines: append_line and flush. Append acts as a
16920 # storage buffer, collecting lines into a group which can be vertically
16921 # aligned. When alignment is no longer possible or desirable, it dumps
16922 # the group to flush.
16924 # append_line -----> flush
16932 # Caution: these debug flags produce a lot of output
16933 # They should all be 0 except when debugging small scripts
16935 use constant VALIGN_DEBUG_FLAG_APPEND => 0;
16936 use constant VALIGN_DEBUG_FLAG_APPEND0 => 0;
16937 use constant VALIGN_DEBUG_FLAG_TERNARY => 0;
16939 my $debug_warning = sub {
16940 print "VALIGN_DEBUGGING with key $_[0]\n";
16943 VALIGN_DEBUG_FLAG_APPEND && $debug_warning->('APPEND');
16944 VALIGN_DEBUG_FLAG_APPEND0 && $debug_warning->('APPEND0');
16949 $vertical_aligner_self
16951 $maximum_alignment_index
16955 $previous_minimum_jmax_seen
16956 $previous_maximum_jmax_seen
16957 $maximum_line_index
16962 $last_group_level_written
16963 $last_leading_space_count
16967 $last_comment_column
16968 $last_side_comment_line_number
16969 $last_side_comment_length
16970 $last_side_comment_level
16971 $outdented_line_count
16972 $first_outdented_line_at
16973 $last_outdented_line_at
16974 $diagnostics_object
16976 $file_writer_object
16977 @side_comment_history
16978 $comment_leading_space_count
16979 $is_matching_terminal_line
16986 $cached_line_leading_space_count
16987 $cached_seqno_string
16990 $last_nonblank_seqno_string
16994 $rOpts_maximum_line_length
16995 $rOpts_continuation_indentation
16996 $rOpts_indent_columns
16998 $rOpts_entab_leading_whitespace
17001 $rOpts_minimum_space_to_comment
17009 ( $class, $rOpts, $file_writer_object, $logger_object, $diagnostics_object )
17012 # variables describing the entire space group:
17013 $ralignment_list = [];
17015 $last_group_level_written = -1;
17016 $extra_indent_ok = 0; # can we move all lines to the right?
17017 $last_side_comment_length = 0;
17018 $maximum_jmax_seen = 0;
17019 $minimum_jmax_seen = 0;
17020 $previous_minimum_jmax_seen = 0;
17021 $previous_maximum_jmax_seen = 0;
17023 # variables describing each line of the group
17024 @group_lines = (); # list of all lines in group
17026 $outdented_line_count = 0;
17027 $first_outdented_line_at = 0;
17028 $last_outdented_line_at = 0;
17029 $last_side_comment_line_number = 0;
17030 $last_side_comment_level = -1;
17031 $is_matching_terminal_line = 0;
17033 # most recent 3 side comments; [ line number, column ]
17034 $side_comment_history[0] = [ -300, 0 ];
17035 $side_comment_history[1] = [ -200, 0 ];
17036 $side_comment_history[2] = [ -100, 0 ];
17038 # write_leader_and_string cache:
17039 $cached_line_text = "";
17040 $cached_line_type = 0;
17041 $cached_line_flag = 0;
17043 $cached_line_valid = 0;
17044 $cached_line_leading_space_count = 0;
17045 $cached_seqno_string = "";
17047 # string of sequence numbers joined together
17048 $seqno_string = "";
17049 $last_nonblank_seqno_string = "";
17051 # frequently used parameters
17052 $rOpts_indent_columns = $rOpts->{'indent-columns'};
17053 $rOpts_tabs = $rOpts->{'tabs'};
17054 $rOpts_entab_leading_whitespace = $rOpts->{'entab-leading-whitespace'};
17055 $rOpts_minimum_space_to_comment = $rOpts->{'minimum-space-to-comment'};
17056 $rOpts_maximum_line_length = $rOpts->{'maximum-line-length'};
17057 $rOpts_valign = $rOpts->{'valign'};
17059 forget_side_comment();
17061 initialize_for_new_group();
17063 $vertical_aligner_self = {};
17064 bless $vertical_aligner_self, $class;
17065 return $vertical_aligner_self;
17068 sub initialize_for_new_group {
17069 $maximum_line_index = -1; # lines in the current group
17070 $maximum_alignment_index = -1; # alignments in current group
17071 $zero_count = 0; # count consecutive lines without tokens
17072 $current_line = undef; # line being matched for alignment
17073 $group_maximum_gap = 0; # largest gap introduced
17075 $marginal_match = 0;
17076 $comment_leading_space_count = 0;
17077 $last_leading_space_count = 0;
17080 # interface to Perl::Tidy::Diagnostics routines
17081 sub write_diagnostics {
17082 if ($diagnostics_object) {
17083 $diagnostics_object->write_diagnostics(@_);
17087 # interface to Perl::Tidy::Logger routines
17089 if ($logger_object) {
17090 $logger_object->warning(@_);
17094 sub write_logfile_entry {
17095 if ($logger_object) {
17096 $logger_object->write_logfile_entry(@_);
17100 sub report_definite_bug {
17101 if ($logger_object) {
17102 $logger_object->report_definite_bug();
17108 # return the number of leading spaces associated with an indentation
17109 # variable $indentation is either a constant number of spaces or an
17110 # object with a get_SPACES method.
17111 my $indentation = shift;
17112 return ref($indentation) ? $indentation->get_SPACES() : $indentation;
17115 sub get_RECOVERABLE_SPACES {
17117 # return the number of spaces (+ means shift right, - means shift left)
17118 # that we would like to shift a group of lines with the same indentation
17119 # to get them to line up with their opening parens
17120 my $indentation = shift;
17121 return ref($indentation) ? $indentation->get_RECOVERABLE_SPACES() : 0;
17124 sub get_STACK_DEPTH {
17126 my $indentation = shift;
17127 return ref($indentation) ? $indentation->get_STACK_DEPTH() : 0;
17130 sub make_alignment {
17131 my ( $col, $token ) = @_;
17133 # make one new alignment at column $col which aligns token $token
17134 ++$maximum_alignment_index;
17135 my $alignment = new Perl::Tidy::VerticalAligner::Alignment(
17137 starting_column => $col,
17138 matching_token => $token,
17139 starting_line => $maximum_line_index,
17140 ending_line => $maximum_line_index,
17141 serial_number => $maximum_alignment_index,
17143 $ralignment_list->[$maximum_alignment_index] = $alignment;
17147 sub dump_alignments {
17149 "Current Alignments:\ni\ttoken\tstarting_column\tcolumn\tstarting_line\tending_line\n";
17150 for my $i ( 0 .. $maximum_alignment_index ) {
17151 my $column = $ralignment_list->[$i]->get_column();
17152 my $starting_column = $ralignment_list->[$i]->get_starting_column();
17153 my $matching_token = $ralignment_list->[$i]->get_matching_token();
17154 my $starting_line = $ralignment_list->[$i]->get_starting_line();
17155 my $ending_line = $ralignment_list->[$i]->get_ending_line();
17157 "$i\t$matching_token\t$starting_column\t$column\t$starting_line\t$ending_line\n";
17161 sub save_alignment_columns {
17162 for my $i ( 0 .. $maximum_alignment_index ) {
17163 $ralignment_list->[$i]->save_column();
17167 sub restore_alignment_columns {
17168 for my $i ( 0 .. $maximum_alignment_index ) {
17169 $ralignment_list->[$i]->restore_column();
17173 sub forget_side_comment {
17174 $last_comment_column = 0;
17179 # sub append is called to place one line in the current vertical group.
17181 # The input parameters are:
17182 # $level = indentation level of this line
17183 # $rfields = reference to array of fields
17184 # $rpatterns = reference to array of patterns, one per field
17185 # $rtokens = reference to array of tokens starting fields 1,2,..
17187 # Here is an example of what this package does. In this example,
17188 # we are trying to line up both the '=>' and the '#'.
17190 # '18' => 'grave', # \`
17191 # '19' => 'acute', # `'
17192 # '20' => 'caron', # \v
17193 # <-tabs-><f1-><--field 2 ---><-f3->
17196 # col1 col2 col3 col4
17198 # The calling routine has already broken the entire line into 3 fields as
17199 # indicated. (So the work of identifying promising common tokens has
17200 # already been done).
17202 # In this example, there will be 2 tokens being matched: '=>' and '#'.
17203 # They are the leading parts of fields 2 and 3, but we do need to know
17204 # what they are so that we can dump a group of lines when these tokens
17207 # The fields contain the actual characters of each field. The patterns
17208 # are like the fields, but they contain mainly token types instead
17209 # of tokens, so they have fewer characters. They are used to be
17210 # sure we are matching fields of similar type.
17212 # In this example, there will be 4 column indexes being adjusted. The
17213 # first one is always at zero. The interior columns are at the start of
17214 # the matching tokens, and the last one tracks the maximum line length.
17216 # Basically, each time a new line comes in, it joins the current vertical
17217 # group if possible. Otherwise it causes the current group to be dumped
17218 # and a new group is started.
17220 # For each new group member, the column locations are increased, as
17221 # necessary, to make room for the new fields. When the group is finally
17222 # output, these column numbers are used to compute the amount of spaces of
17223 # padding needed for each field.
17225 # Programming note: the fields are assumed not to have any tab characters.
17226 # Tabs have been previously removed except for tabs in quoted strings and
17227 # side comments. Tabs in these fields can mess up the column counting.
17228 # The log file warns the user if there are any such tabs.
17231 $level, $level_end,
17232 $indentation, $rfields,
17233 $rtokens, $rpatterns,
17234 $is_forced_break, $outdent_long_lines,
17235 $is_terminal_ternary, $is_terminal_statement,
17236 $do_not_pad, $rvertical_tightness_flags,
17240 # number of fields is $jmax
17241 # number of tokens between fields is $jmax-1
17242 my $jmax = $#{$rfields};
17244 my $leading_space_count = get_SPACES($indentation);
17246 # set outdented flag to be sure we either align within statements or
17247 # across statement boundaries, but not both.
17248 my $is_outdented = $last_leading_space_count > $leading_space_count;
17249 $last_leading_space_count = $leading_space_count;
17251 # Patch: undo for hanging side comment
17252 my $is_hanging_side_comment =
17253 ( $jmax == 1 && $rtokens->[0] eq '#' && $rfields->[0] =~ /^\s*$/ );
17254 $is_outdented = 0 if $is_hanging_side_comment;
17256 VALIGN_DEBUG_FLAG_APPEND0 && do {
17258 "APPEND0: entering lines=$maximum_line_index new #fields= $jmax, leading_count=$leading_space_count last_cmt=$last_comment_column force=$is_forced_break\n";
17261 # Validate cached line if necessary: If we can produce a container
17262 # with just 2 lines total by combining an existing cached opening
17263 # token with the closing token to follow, then we will mark both
17264 # cached flags as valid.
17265 if ($rvertical_tightness_flags) {
17266 if ( $maximum_line_index <= 0
17267 && $cached_line_type
17269 && $rvertical_tightness_flags->[2]
17270 && $rvertical_tightness_flags->[2] == $cached_seqno )
17272 $rvertical_tightness_flags->[3] ||= 1;
17273 $cached_line_valid ||= 1;
17277 # do not join an opening block brace with an unbalanced line
17278 # unless requested with a flag value of 2
17279 if ( $cached_line_type == 3
17280 && $maximum_line_index < 0
17281 && $cached_line_flag < 2
17282 && $level_jump != 0 )
17284 $cached_line_valid = 0;
17287 # patch until new aligner is finished
17288 if ($do_not_pad) { my_flush() }
17290 # shouldn't happen:
17291 if ( $level < 0 ) { $level = 0 }
17293 # do not align code across indentation level changes
17294 # or if vertical alignment is turned off for debugging
17295 if ( $level != $group_level || $is_outdented || !$rOpts_valign ) {
17297 # we are allowed to shift a group of lines to the right if its
17298 # level is greater than the previous and next group
17300 ( $level < $group_level && $last_group_level_written < $group_level );
17304 # If we know that this line will get flushed out by itself because
17305 # of level changes, we can leave the extra_indent_ok flag set.
17306 # That way, if we get an external flush call, we will still be
17307 # able to do some -lp alignment if necessary.
17308 $extra_indent_ok = ( $is_terminal_statement && $level > $group_level );
17310 $group_level = $level;
17312 # wait until after the above flush to get the leading space
17313 # count because it may have been changed if the -icp flag is in
17315 $leading_space_count = get_SPACES($indentation);
17319 # --------------------------------------------------------------------
17320 # Patch to collect outdentable block COMMENTS
17321 # --------------------------------------------------------------------
17322 my $is_blank_line = "";
17323 my $is_block_comment = ( $jmax == 0 && $rfields->[0] =~ /^#/ );
17324 if ( $group_type eq 'COMMENT' ) {
17328 && $outdent_long_lines
17329 && $leading_space_count == $comment_leading_space_count
17334 $group_lines[ ++$maximum_line_index ] = $rfields->[0];
17342 # --------------------------------------------------------------------
17343 # add dummy fields for terminal ternary
17344 # --------------------------------------------------------------------
17345 my $j_terminal_match;
17346 if ( $is_terminal_ternary && $current_line ) {
17347 $j_terminal_match =
17348 fix_terminal_ternary( $rfields, $rtokens, $rpatterns );
17349 $jmax = @{$rfields} - 1;
17352 # --------------------------------------------------------------------
17353 # add dummy fields for else statement
17354 # --------------------------------------------------------------------
17355 if ( $rfields->[0] =~ /^else\s*$/
17357 && $level_jump == 0 )
17359 $j_terminal_match = fix_terminal_else( $rfields, $rtokens, $rpatterns );
17360 $jmax = @{$rfields} - 1;
17363 # --------------------------------------------------------------------
17364 # Step 1. Handle simple line of code with no fields to match.
17365 # --------------------------------------------------------------------
17366 if ( $jmax <= 0 ) {
17369 if ( $maximum_line_index >= 0
17370 && !get_RECOVERABLE_SPACES( $group_lines[0]->get_indentation() ) )
17373 # flush the current group if it has some aligned columns..
17374 if ( $group_lines[0]->get_jmax() > 1 ) { my_flush() }
17376 # flush current group if we are just collecting side comments..
17379 # ...and we haven't seen a comment lately
17380 ( $zero_count > 3 )
17382 # ..or if this new line doesn't fit to the left of the comments
17383 || ( ( $leading_space_count + length( $$rfields[0] ) ) >
17384 $group_lines[0]->get_column(0) )
17391 # patch to start new COMMENT group if this comment may be outdented
17392 if ( $is_block_comment
17393 && $outdent_long_lines
17394 && $maximum_line_index < 0 )
17396 $group_type = 'COMMENT';
17397 $comment_leading_space_count = $leading_space_count;
17398 $group_lines[ ++$maximum_line_index ] = $rfields->[0];
17402 # just write this line directly if no current group, no side comment,
17403 # and no space recovery is needed.
17404 if ( $maximum_line_index < 0 && !get_RECOVERABLE_SPACES($indentation) )
17406 write_leader_and_string( $leading_space_count, $$rfields[0], 0,
17407 $outdent_long_lines, $rvertical_tightness_flags );
17415 # programming check: (shouldn't happen)
17416 # an error here implies an incorrect call was made
17417 if ( $jmax > 0 && ( $#{$rtokens} != ( $jmax - 1 ) ) ) {
17419 "Program bug in Perl::Tidy::VerticalAligner - number of tokens = $#{$rtokens} should be one less than number of fields: $#{$rfields})\n"
17421 report_definite_bug();
17424 # --------------------------------------------------------------------
17425 # create an object to hold this line
17426 # --------------------------------------------------------------------
17427 my $new_line = new Perl::Tidy::VerticalAligner::Line(
17429 jmax_original_line => $jmax,
17430 rtokens => $rtokens,
17431 rfields => $rfields,
17432 rpatterns => $rpatterns,
17433 indentation => $indentation,
17434 leading_space_count => $leading_space_count,
17435 outdent_long_lines => $outdent_long_lines,
17437 is_hanging_side_comment => $is_hanging_side_comment,
17438 maximum_line_length => $rOpts->{'maximum-line-length'},
17439 rvertical_tightness_flags => $rvertical_tightness_flags,
17442 # Initialize a global flag saying if the last line of the group should
17443 # match end of group and also terminate the group. There should be no
17444 # returns between here and where the flag is handled at the bottom.
17445 my $col_matching_terminal = 0;
17446 if ( defined($j_terminal_match) ) {
17448 # remember the column of the terminal ? or { to match with
17449 $col_matching_terminal = $current_line->get_column($j_terminal_match);
17451 # set global flag for sub decide_if_aligned
17452 $is_matching_terminal_line = 1;
17455 # --------------------------------------------------------------------
17456 # It simplifies things to create a zero length side comment
17458 # --------------------------------------------------------------------
17459 make_side_comment( $new_line, $level_end );
17461 # --------------------------------------------------------------------
17462 # Decide if this is a simple list of items.
17463 # There are 3 list types: none, comma, comma-arrow.
17464 # We use this below to be less restrictive in deciding what to align.
17465 # --------------------------------------------------------------------
17466 if ($is_forced_break) {
17467 decide_if_list($new_line);
17470 if ($current_line) {
17472 # --------------------------------------------------------------------
17473 # Allow hanging side comment to join current group, if any
17474 # This will help keep side comments aligned, because otherwise we
17475 # will have to start a new group, making alignment less likely.
17476 # --------------------------------------------------------------------
17477 join_hanging_comment( $new_line, $current_line )
17478 if $is_hanging_side_comment;
17480 # --------------------------------------------------------------------
17481 # If there is just one previous line, and it has more fields
17482 # than the new line, try to join fields together to get a match with
17483 # the new line. At the present time, only a single leading '=' is
17484 # allowed to be compressed out. This is useful in rare cases where
17485 # a table is forced to use old breakpoints because of side comments,
17486 # and the table starts out something like this:
17487 # my %MonthChars = ('0', 'Jan', # side comment
17490 # Eliminating the '=' field will allow the remaining fields to line up.
17491 # This situation does not occur if there are no side comments
17492 # because scan_list would put a break after the opening '('.
17493 # --------------------------------------------------------------------
17494 eliminate_old_fields( $new_line, $current_line );
17496 # --------------------------------------------------------------------
17497 # If the new line has more fields than the current group,
17498 # see if we can match the first fields and combine the remaining
17499 # fields of the new line.
17500 # --------------------------------------------------------------------
17501 eliminate_new_fields( $new_line, $current_line );
17503 # --------------------------------------------------------------------
17504 # Flush previous group unless all common tokens and patterns match..
17505 # --------------------------------------------------------------------
17506 check_match( $new_line, $current_line );
17508 # --------------------------------------------------------------------
17509 # See if there is space for this line in the current group (if any)
17510 # --------------------------------------------------------------------
17511 if ($current_line) {
17512 check_fit( $new_line, $current_line );
17516 # --------------------------------------------------------------------
17517 # Append this line to the current group (or start new group)
17518 # --------------------------------------------------------------------
17519 accept_line($new_line);
17521 # Future update to allow this to vary:
17522 $current_line = $new_line if ( $maximum_line_index == 0 );
17524 # output this group if it ends in a terminal else or ternary line
17525 if ( defined($j_terminal_match) ) {
17527 # if there is only one line in the group (maybe due to failure to match
17528 # perfectly with previous lines), then align the ? or { of this
17529 # terminal line with the previous one unless that would make the line
17531 if ( $maximum_line_index == 0 ) {
17532 my $col_now = $current_line->get_column($j_terminal_match);
17533 my $pad = $col_matching_terminal - $col_now;
17534 my $padding_available =
17535 $current_line->get_available_space_on_right();
17536 if ( $pad > 0 && $pad <= $padding_available ) {
17537 $current_line->increase_field_width( $j_terminal_match, $pad );
17541 $is_matching_terminal_line = 0;
17544 # --------------------------------------------------------------------
17545 # Step 8. Some old debugging stuff
17546 # --------------------------------------------------------------------
17547 VALIGN_DEBUG_FLAG_APPEND && do {
17548 print "APPEND fields:";
17549 dump_array(@$rfields);
17550 print "APPEND tokens:";
17551 dump_array(@$rtokens);
17552 print "APPEND patterns:";
17553 dump_array(@$rpatterns);
17560 sub join_hanging_comment {
17563 my $jmax = $line->get_jmax();
17564 return 0 unless $jmax == 1; # must be 2 fields
17565 my $rtokens = $line->get_rtokens();
17566 return 0 unless $$rtokens[0] eq '#'; # the second field is a comment..
17567 my $rfields = $line->get_rfields();
17568 return 0 unless $$rfields[0] =~ /^\s*$/; # the first field is empty...
17569 my $old_line = shift;
17570 my $maximum_field_index = $old_line->get_jmax();
17572 unless $maximum_field_index > $jmax; # the current line has more fields
17573 my $rpatterns = $line->get_rpatterns();
17575 $line->set_is_hanging_side_comment(1);
17576 $jmax = $maximum_field_index;
17577 $line->set_jmax($jmax);
17578 $$rfields[$jmax] = $$rfields[1];
17579 $$rtokens[ $jmax - 1 ] = $$rtokens[0];
17580 $$rpatterns[ $jmax - 1 ] = $$rpatterns[0];
17581 for ( my $j = 1 ; $j < $jmax ; $j++ ) {
17582 $$rfields[$j] = " "; # NOTE: caused glitch unless 1 blank, why?
17583 $$rtokens[ $j - 1 ] = "";
17584 $$rpatterns[ $j - 1 ] = "";
17589 sub eliminate_old_fields {
17591 my $new_line = shift;
17592 my $jmax = $new_line->get_jmax();
17593 if ( $jmax > $maximum_jmax_seen ) { $maximum_jmax_seen = $jmax }
17594 if ( $jmax < $minimum_jmax_seen ) { $minimum_jmax_seen = $jmax }
17596 # there must be one previous line
17597 return unless ( $maximum_line_index == 0 );
17599 my $old_line = shift;
17600 my $maximum_field_index = $old_line->get_jmax();
17602 # this line must have fewer fields
17603 return unless $maximum_field_index > $jmax;
17605 # Identify specific cases where field elimination is allowed:
17606 # case=1: both lines have comma-separated lists, and the first
17607 # line has an equals
17608 # case=2: both lines have leading equals
17610 # case 1 is the default
17613 # See if case 2: both lines have leading '='
17614 # We'll require smiliar leading patterns in this case
17615 my $old_rtokens = $old_line->get_rtokens();
17616 my $rtokens = $new_line->get_rtokens();
17617 my $rpatterns = $new_line->get_rpatterns();
17618 my $old_rpatterns = $old_line->get_rpatterns();
17619 if ( $rtokens->[0] =~ /^=\d*$/
17620 && $old_rtokens->[0] eq $rtokens->[0]
17621 && $old_rpatterns->[0] eq $rpatterns->[0] )
17626 # not too many fewer fields in new line for case 1
17627 return unless ( $case != 1 || $maximum_field_index - 2 <= $jmax );
17629 # case 1 must have side comment
17630 my $old_rfields = $old_line->get_rfields();
17633 && length( $$old_rfields[$maximum_field_index] ) == 0 );
17635 my $rfields = $new_line->get_rfields();
17637 my $hid_equals = 0;
17639 my @new_alignments = ();
17640 my @new_fields = ();
17641 my @new_matching_patterns = ();
17642 my @new_matching_tokens = ();
17646 my $current_field = '';
17647 my $current_pattern = '';
17649 # loop over all old tokens
17651 for ( $k = 0 ; $k < $maximum_field_index ; $k++ ) {
17652 $current_field .= $$old_rfields[$k];
17653 $current_pattern .= $$old_rpatterns[$k];
17654 last if ( $j > $jmax - 1 );
17656 if ( $$old_rtokens[$k] eq $$rtokens[$j] ) {
17658 $new_fields[$j] = $current_field;
17659 $new_matching_patterns[$j] = $current_pattern;
17660 $current_field = '';
17661 $current_pattern = '';
17662 $new_matching_tokens[$j] = $$old_rtokens[$k];
17663 $new_alignments[$j] = $old_line->get_alignment($k);
17668 if ( $$old_rtokens[$k] =~ /^\=\d*$/ ) {
17669 last if ( $case == 2 ); # avoid problems with stuff
17670 # like: $a=$b=$c=$d;
17674 if ( $in_match && $case == 1 )
17675 ; # disallow gaps in matching field types in case 1
17679 # Modify the current state if we are successful.
17680 # We must exactly reach the ends of both lists for success.
17681 if ( ( $j == $jmax )
17682 && ( $current_field eq '' )
17683 && ( $case != 1 || $hid_equals ) )
17685 $k = $maximum_field_index;
17686 $current_field .= $$old_rfields[$k];
17687 $current_pattern .= $$old_rpatterns[$k];
17688 $new_fields[$j] = $current_field;
17689 $new_matching_patterns[$j] = $current_pattern;
17691 $new_alignments[$j] = $old_line->get_alignment($k);
17692 $maximum_field_index = $j;
17694 $old_line->set_alignments(@new_alignments);
17695 $old_line->set_jmax($jmax);
17696 $old_line->set_rtokens( \@new_matching_tokens );
17697 $old_line->set_rfields( \@new_fields );
17698 $old_line->set_rpatterns( \@$rpatterns );
17702 # create an empty side comment if none exists
17703 sub make_side_comment {
17704 my $new_line = shift;
17705 my $level_end = shift;
17706 my $jmax = $new_line->get_jmax();
17707 my $rtokens = $new_line->get_rtokens();
17709 # if line does not have a side comment...
17710 if ( ( $jmax == 0 ) || ( $$rtokens[ $jmax - 1 ] ne '#' ) ) {
17711 my $rfields = $new_line->get_rfields();
17712 my $rpatterns = $new_line->get_rpatterns();
17713 $$rtokens[$jmax] = '#';
17714 $$rfields[ ++$jmax ] = '';
17715 $$rpatterns[$jmax] = '#';
17716 $new_line->set_jmax($jmax);
17717 $new_line->set_jmax_original_line($jmax);
17720 # line has a side comment..
17723 # don't remember old side comment location for very long
17724 my $line_number = $vertical_aligner_self->get_output_line_number();
17725 my $rfields = $new_line->get_rfields();
17727 $line_number - $last_side_comment_line_number > 12
17729 # and don't remember comment location across block level changes
17730 || ( $level_end < $last_side_comment_level && $$rfields[0] =~ /^}/ )
17733 forget_side_comment();
17735 $last_side_comment_line_number = $line_number;
17736 $last_side_comment_level = $level_end;
17740 sub decide_if_list {
17744 # A list will be taken to be a line with a forced break in which all
17745 # of the field separators are commas or comma-arrows (except for the
17748 # List separator tokens are things like ',3' or '=>2',
17749 # where the trailing digit is the nesting depth. Allow braces
17750 # to allow nested list items.
17751 my $rtokens = $line->get_rtokens();
17752 my $test_token = $$rtokens[0];
17753 if ( $test_token =~ /^(\,|=>)/ ) {
17754 my $list_type = $test_token;
17755 my $jmax = $line->get_jmax();
17757 foreach ( 1 .. $jmax - 2 ) {
17758 if ( $$rtokens[$_] !~ /^(\,|=>|\{)/ ) {
17763 $line->set_list_type($list_type);
17767 sub eliminate_new_fields {
17769 return unless ( $maximum_line_index >= 0 );
17770 my ( $new_line, $old_line ) = @_;
17771 my $jmax = $new_line->get_jmax();
17773 my $old_rtokens = $old_line->get_rtokens();
17774 my $rtokens = $new_line->get_rtokens();
17775 my $is_assignment =
17776 ( $rtokens->[0] =~ /^=\d*$/ && ( $old_rtokens->[0] eq $rtokens->[0] ) );
17778 # must be monotonic variation
17779 return unless ( $is_assignment || $previous_maximum_jmax_seen <= $jmax );
17781 # must be more fields in the new line
17782 my $maximum_field_index = $old_line->get_jmax();
17783 return unless ( $maximum_field_index < $jmax );
17785 unless ($is_assignment) {
17787 unless ( $old_line->get_jmax_original_line() == $minimum_jmax_seen )
17788 ; # only if monotonic
17790 # never combine fields of a comma list
17792 unless ( $maximum_field_index > 1 )
17793 && ( $new_line->get_list_type() !~ /^,/ );
17796 my $rfields = $new_line->get_rfields();
17797 my $rpatterns = $new_line->get_rpatterns();
17798 my $old_rpatterns = $old_line->get_rpatterns();
17800 # loop over all OLD tokens except comment and check match
17803 for ( $k = 0 ; $k < $maximum_field_index - 1 ; $k++ ) {
17804 if ( ( $$old_rtokens[$k] ne $$rtokens[$k] )
17805 || ( $$old_rpatterns[$k] ne $$rpatterns[$k] ) )
17812 # first tokens agree, so combine extra new tokens
17814 for $k ( $maximum_field_index .. $jmax - 1 ) {
17816 $$rfields[ $maximum_field_index - 1 ] .= $$rfields[$k];
17817 $$rfields[$k] = "";
17818 $$rpatterns[ $maximum_field_index - 1 ] .= $$rpatterns[$k];
17819 $$rpatterns[$k] = "";
17822 $$rtokens[ $maximum_field_index - 1 ] = '#';
17823 $$rfields[$maximum_field_index] = $$rfields[$jmax];
17824 $$rpatterns[$maximum_field_index] = $$rpatterns[$jmax];
17825 $jmax = $maximum_field_index;
17827 $new_line->set_jmax($jmax);
17830 sub fix_terminal_ternary {
17832 # Add empty fields as necessary to align a ternary term
17837 # : $year % 100 ? 1
17838 # : $year % 400 ? 0
17841 # returns 1 if the terminal item should be indented
17843 my ( $rfields, $rtokens, $rpatterns ) = @_;
17845 my $jmax = @{$rfields} - 1;
17846 my $old_line = $group_lines[$maximum_line_index];
17847 my $rfields_old = $old_line->get_rfields();
17849 my $rpatterns_old = $old_line->get_rpatterns();
17850 my $rtokens_old = $old_line->get_rtokens();
17851 my $maximum_field_index = $old_line->get_jmax();
17853 # look for the question mark after the :
17855 my $depth_question;
17857 for ( my $j = 0 ; $j < $maximum_field_index ; $j++ ) {
17858 my $tok = $rtokens_old->[$j];
17859 if ( $tok =~ /^\?(\d+)$/ ) {
17860 $depth_question = $1;
17862 # depth must be correct
17863 next unless ( $depth_question eq $group_level );
17866 if ( $rfields_old->[ $j + 1 ] =~ /^(\?\s*)/ ) {
17867 $pad = " " x length($1);
17870 return; # shouldn't happen
17875 return unless ( defined($jquestion) ); # shouldn't happen
17877 # Now splice the tokens and patterns of the previous line
17878 # into the else line to insure a match. Add empty fields
17880 my $jadd = $jquestion;
17882 # Work on copies of the actual arrays in case we have
17883 # to return due to an error
17884 my @fields = @{$rfields};
17885 my @patterns = @{$rpatterns};
17886 my @tokens = @{$rtokens};
17888 VALIGN_DEBUG_FLAG_TERNARY && do {
17890 print "CURRENT FIELDS=<@{$rfields_old}>\n";
17891 print "CURRENT TOKENS=<@{$rtokens_old}>\n";
17892 print "CURRENT PATTERNS=<@{$rpatterns_old}>\n";
17893 print "UNMODIFIED FIELDS=<@{$rfields}>\n";
17894 print "UNMODIFIED TOKENS=<@{$rtokens}>\n";
17895 print "UNMODIFIED PATTERNS=<@{$rpatterns}>\n";
17898 # handle cases of leading colon on this line
17899 if ( $fields[0] =~ /^(:\s*)(.*)$/ ) {
17901 my ( $colon, $therest ) = ( $1, $2 );
17903 # Handle sub-case of first field with leading colon plus additional code
17904 # This is the usual situation as at the '1' below:
17906 # : $year % 400 ? 0
17910 # Split the first field after the leading colon and insert padding.
17911 # Note that this padding will remain even if the terminal value goes
17912 # out on a separate line. This does not seem to look to bad, so no
17913 # mechanism has been included to undo it.
17914 my $field1 = shift @fields;
17915 unshift @fields, ( $colon, $pad . $therest );
17917 # change the leading pattern from : to ?
17918 return unless ( $patterns[0] =~ s/^\:/?/ );
17920 # install leading tokens and patterns of existing line
17921 unshift( @tokens, @{$rtokens_old}[ 0 .. $jquestion ] );
17922 unshift( @patterns, @{$rpatterns_old}[ 0 .. $jquestion ] );
17924 # insert appropriate number of empty fields
17925 splice( @fields, 1, 0, ('') x $jadd ) if $jadd;
17928 # handle sub-case of first field just equal to leading colon.
17929 # This can happen for example in the example below where
17930 # the leading '(' would create a new alignment token
17931 # : ( $name =~ /[]}]$/ ) ? ( $mname = $name )
17932 # : ( $mname = $name . '->' );
17935 return unless ( $jmax > 0 && $tokens[0] ne '#' ); # shouldn't happen
17937 # prepend a leading ? onto the second pattern
17938 $patterns[1] = "?b" . $patterns[1];
17940 # pad the second field
17941 $fields[1] = $pad . $fields[1];
17943 # install leading tokens and patterns of existing line, replacing
17944 # leading token and inserting appropriate number of empty fields
17945 splice( @tokens, 0, 1, @{$rtokens_old}[ 0 .. $jquestion ] );
17946 splice( @patterns, 1, 0, @{$rpatterns_old}[ 1 .. $jquestion ] );
17947 splice( @fields, 1, 0, ('') x $jadd ) if $jadd;
17951 # Handle case of no leading colon on this line. This will
17952 # be the case when -wba=':' is used. For example,
17953 # $year % 400 ? 0 :
17957 # install leading tokens and patterns of existing line
17958 $patterns[0] = '?' . 'b' . $patterns[0];
17959 unshift( @tokens, @{$rtokens_old}[ 0 .. $jquestion ] );
17960 unshift( @patterns, @{$rpatterns_old}[ 0 .. $jquestion ] );
17962 # insert appropriate number of empty fields
17963 $jadd = $jquestion + 1;
17964 $fields[0] = $pad . $fields[0];
17965 splice( @fields, 0, 0, ('') x $jadd ) if $jadd;
17968 VALIGN_DEBUG_FLAG_TERNARY && do {
17970 print "MODIFIED TOKENS=<@tokens>\n";
17971 print "MODIFIED PATTERNS=<@patterns>\n";
17972 print "MODIFIED FIELDS=<@fields>\n";
17975 # all ok .. update the arrays
17976 @{$rfields} = @fields;
17977 @{$rtokens} = @tokens;
17978 @{$rpatterns} = @patterns;
17980 # force a flush after this line
17984 sub fix_terminal_else {
17986 # Add empty fields as necessary to align a balanced terminal
17987 # else block to a previous if/elsif/unless block,
17990 # if ( 1 || $x ) { print "ok 13\n"; }
17991 # else { print "not ok 13\n"; }
17993 # returns 1 if the else block should be indented
17995 my ( $rfields, $rtokens, $rpatterns ) = @_;
17996 my $jmax = @{$rfields} - 1;
17997 return unless ( $jmax > 0 );
17999 # check for balanced else block following if/elsif/unless
18000 my $rfields_old = $current_line->get_rfields();
18002 # TBD: add handling for 'case'
18003 return unless ( $rfields_old->[0] =~ /^(if|elsif|unless)\s*$/ );
18005 # look for the opening brace after the else, and extrace the depth
18006 my $tok_brace = $rtokens->[0];
18008 if ( $tok_brace =~ /^\{(\d+)$/ ) { $depth_brace = $1; }
18010 # probably: "else # side_comment"
18013 my $rpatterns_old = $current_line->get_rpatterns();
18014 my $rtokens_old = $current_line->get_rtokens();
18015 my $maximum_field_index = $current_line->get_jmax();
18017 # be sure the previous if/elsif is followed by an opening paren
18019 my $tok_paren = '(' . $depth_brace;
18020 my $tok_test = $rtokens_old->[$jparen];
18021 return unless ( $tok_test eq $tok_paren ); # shouldn't happen
18023 # Now find the opening block brace
18025 for ( my $j = 1 ; $j < $maximum_field_index ; $j++ ) {
18026 my $tok = $rtokens_old->[$j];
18027 if ( $tok eq $tok_brace ) {
18032 return unless ( defined($jbrace) ); # shouldn't happen
18034 # Now splice the tokens and patterns of the previous line
18035 # into the else line to insure a match. Add empty fields
18037 my $jadd = $jbrace - $jparen;
18038 splice( @{$rtokens}, 0, 0, @{$rtokens_old}[ $jparen .. $jbrace - 1 ] );
18039 splice( @{$rpatterns}, 1, 0, @{$rpatterns_old}[ $jparen + 1 .. $jbrace ] );
18040 splice( @{$rfields}, 1, 0, ('') x $jadd );
18042 # force a flush after this line if it does not follow a case
18044 unless ( $rfields_old->[0] =~ /^case\s*$/ );
18049 my $new_line = shift;
18050 my $old_line = shift;
18052 # uses global variables:
18053 # $previous_minimum_jmax_seen
18054 # $maximum_jmax_seen
18055 # $maximum_line_index
18057 my $jmax = $new_line->get_jmax();
18058 my $maximum_field_index = $old_line->get_jmax();
18060 # flush if this line has too many fields
18061 if ( $jmax > $maximum_field_index ) { my_flush(); return }
18063 # flush if adding this line would make a non-monotonic field count
18065 ( $maximum_field_index > $jmax ) # this has too few fields
18067 ( $previous_minimum_jmax_seen < $jmax ) # and wouldn't be monotonic
18068 || ( $old_line->get_jmax_original_line() != $maximum_jmax_seen )
18076 # otherwise append this line if everything matches
18077 my $jmax_original_line = $new_line->get_jmax_original_line();
18078 my $is_hanging_side_comment = $new_line->get_is_hanging_side_comment();
18079 my $rtokens = $new_line->get_rtokens();
18080 my $rfields = $new_line->get_rfields();
18081 my $rpatterns = $new_line->get_rpatterns();
18082 my $list_type = $new_line->get_list_type();
18084 my $group_list_type = $old_line->get_list_type();
18085 my $old_rpatterns = $old_line->get_rpatterns();
18086 my $old_rtokens = $old_line->get_rtokens();
18088 my $jlimit = $jmax - 1;
18089 if ( $maximum_field_index > $jmax ) {
18090 $jlimit = $jmax_original_line;
18091 --$jlimit unless ( length( $new_line->get_rfields()->[$jmax] ) );
18094 my $everything_matches = 1;
18096 # common list types always match
18097 unless ( ( $group_list_type && ( $list_type eq $group_list_type ) )
18098 || $is_hanging_side_comment )
18101 my $leading_space_count = $new_line->get_leading_space_count();
18102 my $saw_equals = 0;
18103 for my $j ( 0 .. $jlimit ) {
18106 my $old_tok = $$old_rtokens[$j];
18107 my $new_tok = $$rtokens[$j];
18109 # Dumb down the match AFTER an equals and
18110 # also dumb down after seeing a ? ternary operator ...
18111 # Everything after a + is the token which preceded the previous
18112 # opening paren (container name). We won't require them to match.
18113 if ( $saw_equals && $new_tok =~ /(.*)\+/ ) {
18115 $old_tok =~ s/\+.*$//;
18118 if ( $new_tok =~ /^[\?=]\d*$/ ) { $saw_equals = 1 }
18120 # we never match if the matching tokens differ
18122 && $old_tok ne $new_tok )
18127 # otherwise, if patterns match, we always have a match.
18128 # However, if patterns don't match, we have to be careful...
18129 elsif ( $$old_rpatterns[$j] ne $$rpatterns[$j] ) {
18131 # We have to be very careful about aligning commas when the
18132 # pattern's don't match, because it can be worse to create an
18133 # alignment where none is needed than to omit one. The current
18134 # rule: if we are within a matching sub call (indicated by '+'
18135 # in the matching token), we'll allow a marginal match, but
18138 # Here's an example where we'd like to align the '='
18139 # my $cfile = File::Spec->catfile( 't', 'callext.c' );
18140 # my $inc = File::Spec->catdir( 'Basic', 'Core' );
18141 # because the function names differ.
18142 # Future alignment logic should make this unnecessary.
18144 # Here's an example where the ','s are not contained in a call.
18145 # The first line below should probably not match the next two:
18146 # ( $a, $b ) = ( $b, $r );
18147 # ( $x1, $x2 ) = ( $x2 - $q * $x1, $x1 );
18148 # ( $y1, $y2 ) = ( $y2 - $q * $y1, $y1 );
18149 if ( $new_tok =~ /^,/ ) {
18150 if ( $$rtokens[$j] =~ /[A-Za-z]/ ) {
18151 $marginal_match = 1;
18158 # parens don't align well unless patterns match
18159 elsif ( $new_tok =~ /^\(/ ) {
18163 # Handle an '=' alignment with different patterns to
18165 elsif ( $new_tok =~ /^=\d*$/ ) {
18169 # It is best to be a little restrictive when
18170 # aligning '=' tokens. Here is an example of
18171 # two lines that we will not align:
18174 # The problem is that one is a 'my' declaration,
18175 # and the other isn't, so they're not very similar.
18176 # We will filter these out by comparing the first
18177 # letter of the pattern. This is crude, but works
18180 substr( $$old_rpatterns[$j], 0, 1 ) ne
18181 substr( $$rpatterns[$j], 0, 1 ) )
18186 # If we pass that test, we'll call it a marginal match.
18187 # Here is an example of a marginal match:
18189 # $op = compile_bblock($op);
18190 # The left tokens are both identifiers, but
18191 # one accesses a hash and the other doesn't.
18192 # We'll let this be a tentative match and undo
18193 # it later if we don't find more than 2 lines
18195 elsif ( $maximum_line_index == 0 ) {
18196 $marginal_match = 1;
18201 # Don't let line with fewer fields increase column widths
18203 if ( $maximum_field_index > $jmax ) {
18205 length( $$rfields[$j] ) - $old_line->current_field_width($j);
18208 $pad += $leading_space_count;
18211 # TESTING: suspend this rule to allow last lines to join
18212 if ( $pad > 0 ) { $match = 0; }
18216 $everything_matches = 0;
18222 if ( $maximum_field_index > $jmax ) {
18224 if ($everything_matches) {
18226 my $comment = $$rfields[$jmax];
18227 for $jmax ( $jlimit .. $maximum_field_index ) {
18228 $$rtokens[$jmax] = $$old_rtokens[$jmax];
18229 $$rfields[ ++$jmax ] = '';
18230 $$rpatterns[$jmax] = $$old_rpatterns[$jmax];
18232 $$rfields[$jmax] = $comment;
18233 $new_line->set_jmax($jmax);
18237 my_flush() unless ($everything_matches);
18242 return unless ( $maximum_line_index >= 0 );
18243 my $new_line = shift;
18244 my $old_line = shift;
18246 my $jmax = $new_line->get_jmax();
18247 my $leading_space_count = $new_line->get_leading_space_count();
18248 my $is_hanging_side_comment = $new_line->get_is_hanging_side_comment();
18249 my $rtokens = $new_line->get_rtokens();
18250 my $rfields = $new_line->get_rfields();
18251 my $rpatterns = $new_line->get_rpatterns();
18253 my $group_list_type = $group_lines[0]->get_list_type();
18255 my $padding_so_far = 0;
18256 my $padding_available = $old_line->get_available_space_on_right();
18258 # save current columns in case this doesn't work
18259 save_alignment_columns();
18261 my ( $j, $pad, $eight );
18262 my $maximum_field_index = $old_line->get_jmax();
18263 for $j ( 0 .. $jmax ) {
18265 $pad = length( $$rfields[$j] ) - $old_line->current_field_width($j);
18268 $pad += $leading_space_count;
18271 # remember largest gap of the group, excluding gap to side comment
18273 && $group_maximum_gap < -$pad
18275 && $j < $jmax - 1 )
18277 $group_maximum_gap = -$pad;
18282 ## This patch helps sometimes, but it doesn't check to see if
18283 ## the line is too long even without the side comment. It needs
18285 ##don't let a long token with no trailing side comment push
18286 ##side comments out, or end a group. (sidecmt1.t)
18287 ##next if ($j==$jmax-1 && length($$rfields[$jmax])==0);
18289 # This line will need space; lets see if we want to accept it..
18292 # not if this won't fit
18293 ( $pad > $padding_available )
18295 # previously, there were upper bounds placed on padding here
18296 # (maximum_whitespace_columns), but they were not really helpful
18301 # revert to starting state then flush; things didn't work out
18302 restore_alignment_columns();
18307 # patch to avoid excessive gaps in previous lines,
18308 # due to a line of fewer fields.
18309 # return join( ".",
18310 # $self->{"dfi"}, $self->{"aa"}, $self->rsvd, $self->{"rd"},
18311 # $self->{"area"}, $self->{"id"}, $self->{"sel"} );
18312 next if ( $jmax < $maximum_field_index && $j == $jmax - 1 );
18314 # looks ok, squeeze this field in
18315 $old_line->increase_field_width( $j, $pad );
18316 $padding_available -= $pad;
18318 # remember largest gap of the group, excluding gap to side comment
18319 if ( $pad > $group_maximum_gap && $j > 0 && $j < $jmax - 1 ) {
18320 $group_maximum_gap = $pad;
18327 # The current line either starts a new alignment group or is
18328 # accepted into the current alignment group.
18329 my $new_line = shift;
18330 $group_lines[ ++$maximum_line_index ] = $new_line;
18332 # initialize field lengths if starting new group
18333 if ( $maximum_line_index == 0 ) {
18335 my $jmax = $new_line->get_jmax();
18336 my $rfields = $new_line->get_rfields();
18337 my $rtokens = $new_line->get_rtokens();
18339 my $col = $new_line->get_leading_space_count();
18341 for $j ( 0 .. $jmax ) {
18342 $col += length( $$rfields[$j] );
18344 # create initial alignments for the new group
18346 if ( $j < $jmax ) { $token = $$rtokens[$j] }
18347 my $alignment = make_alignment( $col, $token );
18348 $new_line->set_alignment( $j, $alignment );
18351 $maximum_jmax_seen = $jmax;
18352 $minimum_jmax_seen = $jmax;
18355 # use previous alignments otherwise
18357 my @new_alignments =
18358 $group_lines[ $maximum_line_index - 1 ]->get_alignments();
18359 $new_line->set_alignments(@new_alignments);
18362 # remember group jmax extremes for next call to append_line
18363 $previous_minimum_jmax_seen = $minimum_jmax_seen;
18364 $previous_maximum_jmax_seen = $maximum_jmax_seen;
18369 # debug routine to dump array contents
18374 # flush() sends the current Perl::Tidy::VerticalAligner group down the
18375 # pipeline to Perl::Tidy::FileWriter.
18377 # This is the external flush, which also empties the cache
18380 if ( $maximum_line_index < 0 ) {
18381 if ($cached_line_type) {
18382 $seqno_string = $cached_seqno_string;
18383 entab_and_output( $cached_line_text,
18384 $cached_line_leading_space_count,
18385 $last_group_level_written );
18386 $cached_line_type = 0;
18387 $cached_line_text = "";
18388 $cached_seqno_string = "";
18396 # This is the internal flush, which leaves the cache intact
18399 return if ( $maximum_line_index < 0 );
18401 # handle a group of comment lines
18402 if ( $group_type eq 'COMMENT' ) {
18404 VALIGN_DEBUG_FLAG_APPEND0 && do {
18405 my ( $a, $b, $c ) = caller();
18407 "APPEND0: Flush called from $a $b $c for COMMENT group: lines=$maximum_line_index \n";
18410 my $leading_space_count = $comment_leading_space_count;
18411 my $leading_string = get_leading_string($leading_space_count);
18413 # zero leading space count if any lines are too long
18414 my $max_excess = 0;
18415 for my $i ( 0 .. $maximum_line_index ) {
18416 my $str = $group_lines[$i];
18418 length($str) + $leading_space_count - $rOpts_maximum_line_length;
18419 if ( $excess > $max_excess ) {
18420 $max_excess = $excess;
18424 if ( $max_excess > 0 ) {
18425 $leading_space_count -= $max_excess;
18426 if ( $leading_space_count < 0 ) { $leading_space_count = 0 }
18427 $last_outdented_line_at =
18428 $file_writer_object->get_output_line_number();
18429 unless ($outdented_line_count) {
18430 $first_outdented_line_at = $last_outdented_line_at;
18432 $outdented_line_count += ( $maximum_line_index + 1 );
18435 # write the group of lines
18436 my $outdent_long_lines = 0;
18437 for my $i ( 0 .. $maximum_line_index ) {
18438 write_leader_and_string( $leading_space_count, $group_lines[$i], 0,
18439 $outdent_long_lines, "" );
18443 # handle a group of code lines
18446 VALIGN_DEBUG_FLAG_APPEND0 && do {
18447 my $group_list_type = $group_lines[0]->get_list_type();
18448 my ( $a, $b, $c ) = caller();
18449 my $maximum_field_index = $group_lines[0]->get_jmax();
18451 "APPEND0: Flush called from $a $b $c fields=$maximum_field_index list=$group_list_type lines=$maximum_line_index extra=$extra_indent_ok\n";
18455 # some small groups are best left unaligned
18456 my $do_not_align = decide_if_aligned();
18458 # optimize side comment location
18459 $do_not_align = adjust_side_comment($do_not_align);
18461 # recover spaces for -lp option if possible
18462 my $extra_leading_spaces = get_extra_leading_spaces();
18464 # all lines of this group have the same basic leading spacing
18465 my $group_leader_length = $group_lines[0]->get_leading_space_count();
18467 # add extra leading spaces if helpful
18468 my $min_ci_gap = improve_continuation_indentation( $do_not_align,
18469 $group_leader_length );
18471 # loop to output all lines
18472 for my $i ( 0 .. $maximum_line_index ) {
18473 my $line = $group_lines[$i];
18474 write_vertically_aligned_line( $line, $min_ci_gap, $do_not_align,
18475 $group_leader_length, $extra_leading_spaces );
18478 initialize_for_new_group();
18481 sub decide_if_aligned {
18483 # Do not try to align two lines which are not really similar
18484 return unless $maximum_line_index == 1;
18485 return if ($is_matching_terminal_line);
18487 my $group_list_type = $group_lines[0]->get_list_type();
18489 my $do_not_align = (
18491 # always align lists
18496 # don't align if it was just a marginal match
18499 # don't align two lines with big gap
18500 || $group_maximum_gap > 12
18502 # or lines with differing number of alignment tokens
18503 # TODO: this could be improved. It occasionally rejects
18505 || $previous_maximum_jmax_seen != $previous_minimum_jmax_seen
18509 # But try to convert them into a simple comment group if the first line
18510 # a has side comment
18511 my $rfields = $group_lines[0]->get_rfields();
18512 my $maximum_field_index = $group_lines[0]->get_jmax();
18514 && ( $maximum_line_index > 0 )
18515 && ( length( $$rfields[$maximum_field_index] ) > 0 ) )
18520 return $do_not_align;
18523 sub adjust_side_comment {
18525 my $do_not_align = shift;
18527 # let's see if we can move the side comment field out a little
18528 # to improve readability (the last field is always a side comment field)
18529 my $have_side_comment = 0;
18530 my $first_side_comment_line = -1;
18531 my $maximum_field_index = $group_lines[0]->get_jmax();
18532 for my $i ( 0 .. $maximum_line_index ) {
18533 my $line = $group_lines[$i];
18535 if ( length( $line->get_rfields()->[$maximum_field_index] ) ) {
18536 $have_side_comment = 1;
18537 $first_side_comment_line = $i;
18542 my $kmax = $maximum_field_index + 1;
18544 if ($have_side_comment) {
18546 my $line = $group_lines[0];
18548 # the maximum space without exceeding the line length:
18549 my $avail = $line->get_available_space_on_right();
18551 # try to use the previous comment column
18552 my $side_comment_column = $line->get_column( $kmax - 2 );
18553 my $move = $last_comment_column - $side_comment_column;
18555 ## my $sc_line0 = $side_comment_history[0]->[0];
18556 ## my $sc_col0 = $side_comment_history[0]->[1];
18557 ## my $sc_line1 = $side_comment_history[1]->[0];
18558 ## my $sc_col1 = $side_comment_history[1]->[1];
18559 ## my $sc_line2 = $side_comment_history[2]->[0];
18560 ## my $sc_col2 = $side_comment_history[2]->[1];
18562 ## # FUTURE UPDATES:
18563 ## # Be sure to ignore 'do not align' and '} # end comments'
18564 ## # Find first $move > 0 and $move <= $avail as follows:
18565 ## # 1. try sc_col1 if sc_col1 == sc_col0 && (line-sc_line0) < 12
18566 ## # 2. try sc_col2 if (line-sc_line2) < 12
18567 ## # 3. try min possible space, plus up to 8,
18568 ## # 4. try min possible space
18570 if ( $kmax > 0 && !$do_not_align ) {
18572 # but if this doesn't work, give up and use the minimum space
18573 if ( $move > $avail ) {
18574 $move = $rOpts_minimum_space_to_comment - 1;
18577 # but we want some minimum space to the comment
18578 my $min_move = $rOpts_minimum_space_to_comment - 1;
18580 && $last_side_comment_length > 0
18581 && ( $first_side_comment_line == 0 )
18582 && $group_level == $last_group_level_written )
18587 if ( $move < $min_move ) {
18591 # prevously, an upper bound was placed on $move here,
18592 # (maximum_space_to_comment), but it was not helpful
18594 # don't exceed the available space
18595 if ( $move > $avail ) { $move = $avail }
18597 # we can only increase space, never decrease
18599 $line->increase_field_width( $maximum_field_index - 1, $move );
18602 # remember this column for the next group
18603 $last_comment_column = $line->get_column( $kmax - 2 );
18607 # try to at least line up the existing side comment location
18608 if ( $kmax > 0 && $move > 0 && $move < $avail ) {
18609 $line->increase_field_width( $maximum_field_index - 1, $move );
18613 # reset side comment column if we can't align
18615 forget_side_comment();
18619 return $do_not_align;
18622 sub improve_continuation_indentation {
18623 my ( $do_not_align, $group_leader_length ) = @_;
18625 # See if we can increase the continuation indentation
18626 # to move all continuation lines closer to the next field
18627 # (unless it is a comment).
18629 # '$min_ci_gap'is the extra indentation that we may need to introduce.
18630 # We will only introduce this to fields which already have some ci.
18631 # Without this variable, we would occasionally get something like this
18634 # use overload '+' => \&plus,
18636 # '*' => \&multiply,
18639 # 'atan2' => \&atan2,
18641 # Whereas with this variable, we can shift variables over to get this:
18643 # use overload '+' => \&plus,
18645 # '*' => \&multiply,
18648 # 'atan2' => \&atan2,
18650 ## BUB: Deactivated####################
18651 # The trouble with this patch is that it may, for example,
18652 # move in some 'or's or ':'s, and leave some out, so that the
18653 # left edge alignment suffers.
18655 ###########################################
18657 my $maximum_field_index = $group_lines[0]->get_jmax();
18659 my $min_ci_gap = $rOpts_maximum_line_length;
18660 if ( $maximum_field_index > 1 && !$do_not_align ) {
18662 for my $i ( 0 .. $maximum_line_index ) {
18663 my $line = $group_lines[$i];
18664 my $leading_space_count = $line->get_leading_space_count();
18665 my $rfields = $line->get_rfields();
18668 $line->get_column(0) -
18669 $leading_space_count -
18670 length( $$rfields[0] );
18672 if ( $leading_space_count > $group_leader_length ) {
18673 if ( $gap < $min_ci_gap ) { $min_ci_gap = $gap }
18677 if ( $min_ci_gap >= $rOpts_maximum_line_length ) {
18684 return $min_ci_gap;
18687 sub write_vertically_aligned_line {
18689 my ( $line, $min_ci_gap, $do_not_align, $group_leader_length,
18690 $extra_leading_spaces )
18692 my $rfields = $line->get_rfields();
18693 my $leading_space_count = $line->get_leading_space_count();
18694 my $outdent_long_lines = $line->get_outdent_long_lines();
18695 my $maximum_field_index = $line->get_jmax();
18696 my $rvertical_tightness_flags = $line->get_rvertical_tightness_flags();
18698 # add any extra spaces
18699 if ( $leading_space_count > $group_leader_length ) {
18700 $leading_space_count += $min_ci_gap;
18703 my $str = $$rfields[0];
18705 # loop to concatenate all fields of this line and needed padding
18706 my $total_pad_count = 0;
18708 for $j ( 1 .. $maximum_field_index ) {
18710 # skip zero-length side comments
18712 if ( ( $j == $maximum_field_index )
18713 && ( !defined( $$rfields[$j] ) || ( length( $$rfields[$j] ) == 0 ) )
18716 # compute spaces of padding before this field
18717 my $col = $line->get_column( $j - 1 );
18718 $pad = $col - ( length($str) + $leading_space_count );
18720 if ($do_not_align) {
18722 ( $j < $maximum_field_index )
18724 : $rOpts_minimum_space_to_comment - 1;
18727 # accumulate the padding
18728 if ( $pad > 0 ) { $total_pad_count += $pad; }
18731 if ( !defined $$rfields[$j] ) {
18732 write_diagnostics("UNDEFined field at j=$j\n");
18735 # only add padding when we have a finite field;
18736 # this avoids extra terminal spaces if we have empty fields
18737 if ( length( $$rfields[$j] ) > 0 ) {
18738 $str .= ' ' x $total_pad_count;
18739 $total_pad_count = 0;
18740 $str .= $$rfields[$j];
18743 $total_pad_count = 0;
18746 # update side comment history buffer
18747 if ( $j == $maximum_field_index ) {
18748 my $lineno = $file_writer_object->get_output_line_number();
18749 shift @side_comment_history;
18750 push @side_comment_history, [ $lineno, $col ];
18754 my $side_comment_length = ( length( $$rfields[$maximum_field_index] ) );
18756 # ship this line off
18757 write_leader_and_string( $leading_space_count + $extra_leading_spaces,
18758 $str, $side_comment_length, $outdent_long_lines,
18759 $rvertical_tightness_flags );
18762 sub get_extra_leading_spaces {
18764 #----------------------------------------------------------
18765 # Define any extra indentation space (for the -lp option).
18767 # If a list has side comments, sub scan_list must dump the
18768 # list before it sees everything. When this happens, it sets
18769 # the indentation to the standard scheme, but notes how
18770 # many spaces it would have liked to use. We may be able
18771 # to recover that space here in the event that that all of the
18772 # lines of a list are back together again.
18773 #----------------------------------------------------------
18775 my $extra_leading_spaces = 0;
18776 if ($extra_indent_ok) {
18777 my $object = $group_lines[0]->get_indentation();
18778 if ( ref($object) ) {
18779 my $extra_indentation_spaces_wanted =
18780 get_RECOVERABLE_SPACES($object);
18782 # all indentation objects must be the same
18784 for $i ( 1 .. $maximum_line_index ) {
18785 if ( $object != $group_lines[$i]->get_indentation() ) {
18786 $extra_indentation_spaces_wanted = 0;
18791 if ($extra_indentation_spaces_wanted) {
18793 # the maximum space without exceeding the line length:
18794 my $avail = $group_lines[0]->get_available_space_on_right();
18795 $extra_leading_spaces =
18796 ( $avail > $extra_indentation_spaces_wanted )
18797 ? $extra_indentation_spaces_wanted
18800 # update the indentation object because with -icp the terminal
18801 # ');' will use the same adjustment.
18802 $object->permanently_decrease_AVAILABLE_SPACES(
18803 -$extra_leading_spaces );
18807 return $extra_leading_spaces;
18810 sub combine_fields {
18812 # combine all fields except for the comment field ( sidecmt.t )
18813 # Uses global variables:
18815 # $maximum_line_index
18817 my $maximum_field_index = $group_lines[0]->get_jmax();
18818 for ( $j = 0 ; $j <= $maximum_line_index ; $j++ ) {
18819 my $line = $group_lines[$j];
18820 my $rfields = $line->get_rfields();
18821 foreach ( 1 .. $maximum_field_index - 1 ) {
18822 $$rfields[0] .= $$rfields[$_];
18824 $$rfields[1] = $$rfields[$maximum_field_index];
18826 $line->set_jmax(1);
18827 $line->set_column( 0, 0 );
18828 $line->set_column( 1, 0 );
18831 $maximum_field_index = 1;
18833 for $j ( 0 .. $maximum_line_index ) {
18834 my $line = $group_lines[$j];
18835 my $rfields = $line->get_rfields();
18836 for $k ( 0 .. $maximum_field_index ) {
18837 my $pad = length( $$rfields[$k] ) - $line->current_field_width($k);
18839 $pad += $group_lines[$j]->get_leading_space_count();
18842 if ( $pad > 0 ) { $line->increase_field_width( $k, $pad ) }
18848 sub get_output_line_number {
18850 # the output line number reported to a caller is the number of items
18851 # written plus the number of items in the buffer
18853 1 + $maximum_line_index + $file_writer_object->get_output_line_number();
18856 sub write_leader_and_string {
18858 my ( $leading_space_count, $str, $side_comment_length, $outdent_long_lines,
18859 $rvertical_tightness_flags )
18862 # handle outdenting of long lines:
18863 if ($outdent_long_lines) {
18866 $side_comment_length +
18867 $leading_space_count -
18868 $rOpts_maximum_line_length;
18869 if ( $excess > 0 ) {
18870 $leading_space_count = 0;
18871 $last_outdented_line_at =
18872 $file_writer_object->get_output_line_number();
18874 unless ($outdented_line_count) {
18875 $first_outdented_line_at = $last_outdented_line_at;
18877 $outdented_line_count++;
18881 # Make preliminary leading whitespace. It could get changed
18882 # later by entabbing, so we have to keep track of any changes
18883 # to the leading_space_count from here on.
18884 my $leading_string =
18885 $leading_space_count > 0 ? ( ' ' x $leading_space_count ) : "";
18887 # Unpack any recombination data; it was packed by
18888 # sub send_lines_to_vertical_aligner. Contents:
18890 # [0] type: 1=opening 2=closing 3=opening block brace
18891 # [1] flag: if opening: 1=no multiple steps, 2=multiple steps ok
18892 # if closing: spaces of padding to use
18893 # [2] sequence number of container
18894 # [3] valid flag: do not append if this flag is false
18896 my ( $open_or_close, $tightness_flag, $seqno, $valid, $seqno_beg,
18898 if ($rvertical_tightness_flags) {
18900 $open_or_close, $tightness_flag, $seqno, $valid, $seqno_beg,
18902 ) = @{$rvertical_tightness_flags};
18905 $seqno_string = $seqno_end;
18907 # handle any cached line ..
18908 # either append this line to it or write it out
18909 if ( length($cached_line_text) ) {
18911 if ( !$cached_line_valid ) {
18912 entab_and_output( $cached_line_text,
18913 $cached_line_leading_space_count,
18914 $last_group_level_written );
18917 # handle cached line with opening container token
18918 elsif ( $cached_line_type == 1 || $cached_line_type == 3 ) {
18920 my $gap = $leading_space_count - length($cached_line_text);
18922 # handle option of just one tight opening per line:
18923 if ( $cached_line_flag == 1 ) {
18924 if ( defined($open_or_close) && $open_or_close == 1 ) {
18930 $leading_string = $cached_line_text . ' ' x $gap;
18931 $leading_space_count = $cached_line_leading_space_count;
18932 $seqno_string = $cached_seqno_string . ':' . $seqno_beg;
18935 entab_and_output( $cached_line_text,
18936 $cached_line_leading_space_count,
18937 $last_group_level_written );
18941 # handle cached line to place before this closing container token
18943 my $test_line = $cached_line_text . ' ' x $cached_line_flag . $str;
18945 if ( length($test_line) <= $rOpts_maximum_line_length ) {
18947 $seqno_string = $cached_seqno_string . ':' . $seqno_beg;
18949 # Patch to outdent closing tokens ending # in ');'
18950 # If we are joining a line like ');' to a previous stacked
18951 # set of closing tokens, then decide if we may outdent the
18952 # combined stack to the indentation of the ');'. Since we
18953 # should not normally outdent any of the other tokens more than
18954 # the indentation of the lines that contained them, we will
18955 # only do this if all of the corresponding opening
18956 # tokens were on the same line. This can happen with
18957 # -sot and -sct. For example, it is ok here:
18958 # __PACKAGE__->load_components( qw(
18963 # But, for example, we do not outdent in this example because
18964 # that would put the closing sub brace out farther than the
18965 # opening sub brace:
18967 # perltidy -sot -sct
18969 # '<Control-f>' => sub {
18971 # my $e = $c->XEvent;
18972 # itemsUnderArea $c;
18975 if ( $str =~ /^\);/ && $cached_line_text =~ /^[\)\}\]\s]*$/ ) {
18977 # The way to tell this is if the stacked sequence numbers
18978 # of this output line are the reverse of the stacked
18979 # sequence numbers of the previous non-blank line of
18980 # sequence numbers. So we can join if the previous
18981 # nonblank string of tokens is the mirror image. For
18982 # example if stack )}] is 13:8:6 then we are looking for a
18983 # leading stack like [{( which is 6:8:13 We only need to
18984 # check the two ends, because the intermediate tokens must
18985 # fall in order. Note on speed: having to split on colons
18986 # and eliminate multiple colons might appear to be slow,
18987 # but it's not an issue because we almost never come
18988 # through here. In a typical file we don't.
18989 $seqno_string =~ s/^:+//;
18990 $last_nonblank_seqno_string =~ s/^:+//;
18991 $seqno_string =~ s/:+/:/g;
18992 $last_nonblank_seqno_string =~ s/:+/:/g;
18994 # how many spaces can we outdent?
18996 $cached_line_leading_space_count - $leading_space_count;
18998 && length($seqno_string)
18999 && length($last_nonblank_seqno_string) ==
19000 length($seqno_string) )
19003 ( split ':', $last_nonblank_seqno_string );
19004 my @seqno_now = ( split ':', $seqno_string );
19005 if ( $seqno_now[-1] == $seqno_last[0]
19006 && $seqno_now[0] == $seqno_last[-1] )
19010 # for absolute safety, be sure we only remove
19012 my $ws = substr( $test_line, 0, $diff );
19013 if ( ( length($ws) == $diff ) && $ws =~ /^\s+$/ ) {
19015 $test_line = substr( $test_line, $diff );
19016 $cached_line_leading_space_count -= $diff;
19019 # shouldn't happen, but not critical:
19021 ## ERROR transferring indentation here
19028 $leading_string = "";
19029 $leading_space_count = $cached_line_leading_space_count;
19032 entab_and_output( $cached_line_text,
19033 $cached_line_leading_space_count,
19034 $last_group_level_written );
19038 $cached_line_type = 0;
19039 $cached_line_text = "";
19041 # make the line to be written
19042 my $line = $leading_string . $str;
19044 # write or cache this line
19045 if ( !$open_or_close || $side_comment_length > 0 ) {
19046 entab_and_output( $line, $leading_space_count, $group_level );
19049 $cached_line_text = $line;
19050 $cached_line_type = $open_or_close;
19051 $cached_line_flag = $tightness_flag;
19052 $cached_seqno = $seqno;
19053 $cached_line_valid = $valid;
19054 $cached_line_leading_space_count = $leading_space_count;
19055 $cached_seqno_string = $seqno_string;
19058 $last_group_level_written = $group_level;
19059 $last_side_comment_length = $side_comment_length;
19060 $extra_indent_ok = 0;
19063 sub entab_and_output {
19064 my ( $line, $leading_space_count, $level ) = @_;
19066 # The line is currently correct if there is no tabbing (recommended!)
19067 # We may have to lop off some leading spaces and replace with tabs.
19068 if ( $leading_space_count > 0 ) {
19070 # Nothing to do if no tabs
19071 if ( !( $rOpts_tabs || $rOpts_entab_leading_whitespace )
19072 || $rOpts_indent_columns <= 0 )
19078 # Handle entab option
19079 elsif ($rOpts_entab_leading_whitespace) {
19081 $leading_space_count % $rOpts_entab_leading_whitespace;
19083 int( $leading_space_count / $rOpts_entab_leading_whitespace );
19084 my $leading_string = "\t" x $tab_count . ' ' x $space_count;
19085 if ( $line =~ /^\s{$leading_space_count,$leading_space_count}/ ) {
19086 substr( $line, 0, $leading_space_count ) = $leading_string;
19090 # REMOVE AFTER TESTING
19091 # shouldn't happen - program error counting whitespace
19092 # we'll skip entabbing
19094 "Error entabbing in entab_and_output: expected count=$leading_space_count\n"
19099 # Handle option of one tab per level
19101 my $leading_string = ( "\t" x $level );
19103 $leading_space_count - $level * $rOpts_indent_columns;
19105 # shouldn't happen:
19106 if ( $space_count < 0 ) {
19108 "Error entabbing in append_line: for level=$group_level count=$leading_space_count\n"
19110 $leading_string = ( ' ' x $leading_space_count );
19113 $leading_string .= ( ' ' x $space_count );
19115 if ( $line =~ /^\s{$leading_space_count,$leading_space_count}/ ) {
19116 substr( $line, 0, $leading_space_count ) = $leading_string;
19120 # REMOVE AFTER TESTING
19121 # shouldn't happen - program error counting whitespace
19122 # we'll skip entabbing
19124 "Error entabbing in entab_and_output: expected count=$leading_space_count\n"
19129 $file_writer_object->write_code_line( $line . "\n" );
19130 if ($seqno_string) {
19131 $last_nonblank_seqno_string = $seqno_string;
19135 { # begin get_leading_string
19137 my @leading_string_cache;
19139 sub get_leading_string {
19141 # define the leading whitespace string for this line..
19142 my $leading_whitespace_count = shift;
19144 # Handle case of zero whitespace, which includes multi-line quotes
19145 # (which may have a finite level; this prevents tab problems)
19146 if ( $leading_whitespace_count <= 0 ) {
19150 # look for previous result
19151 elsif ( $leading_string_cache[$leading_whitespace_count] ) {
19152 return $leading_string_cache[$leading_whitespace_count];
19155 # must compute a string for this number of spaces
19156 my $leading_string;
19158 # Handle simple case of no tabs
19159 if ( !( $rOpts_tabs || $rOpts_entab_leading_whitespace )
19160 || $rOpts_indent_columns <= 0 )
19162 $leading_string = ( ' ' x $leading_whitespace_count );
19165 # Handle entab option
19166 elsif ($rOpts_entab_leading_whitespace) {
19168 $leading_whitespace_count % $rOpts_entab_leading_whitespace;
19169 my $tab_count = int(
19170 $leading_whitespace_count / $rOpts_entab_leading_whitespace );
19171 $leading_string = "\t" x $tab_count . ' ' x $space_count;
19174 # Handle option of one tab per level
19176 $leading_string = ( "\t" x $group_level );
19178 $leading_whitespace_count - $group_level * $rOpts_indent_columns;
19180 # shouldn't happen:
19181 if ( $space_count < 0 ) {
19183 "Error in append_line: for level=$group_level count=$leading_whitespace_count\n"
19185 $leading_string = ( ' ' x $leading_whitespace_count );
19188 $leading_string .= ( ' ' x $space_count );
19191 $leading_string_cache[$leading_whitespace_count] = $leading_string;
19192 return $leading_string;
19194 } # end get_leading_string
19196 sub report_anything_unusual {
19198 if ( $outdented_line_count > 0 ) {
19199 write_logfile_entry(
19200 "$outdented_line_count long lines were outdented:\n");
19201 write_logfile_entry(
19202 " First at output line $first_outdented_line_at\n");
19204 if ( $outdented_line_count > 1 ) {
19205 write_logfile_entry(
19206 " Last at output line $last_outdented_line_at\n");
19208 write_logfile_entry(
19209 " use -noll to prevent outdenting, -l=n to increase line length\n"
19211 write_logfile_entry("\n");
19215 #####################################################################
19217 # the Perl::Tidy::FileWriter class writes the output file
19219 #####################################################################
19221 package Perl::Tidy::FileWriter;
19223 # Maximum number of little messages; probably need not be changed.
19224 use constant MAX_NAG_MESSAGES => 6;
19226 sub write_logfile_entry {
19228 my $logger_object = $self->{_logger_object};
19229 if ($logger_object) {
19230 $logger_object->write_logfile_entry(@_);
19236 my ( $line_sink_object, $rOpts, $logger_object ) = @_;
19239 _line_sink_object => $line_sink_object,
19240 _logger_object => $logger_object,
19242 _output_line_number => 1,
19243 _consecutive_blank_lines => 0,
19244 _consecutive_nonblank_lines => 0,
19245 _first_line_length_error => 0,
19246 _max_line_length_error => 0,
19247 _last_line_length_error => 0,
19248 _first_line_length_error_at => 0,
19249 _max_line_length_error_at => 0,
19250 _last_line_length_error_at => 0,
19251 _line_length_error_count => 0,
19252 _max_output_line_length => 0,
19253 _max_output_line_length_at => 0,
19259 $self->{_line_sink_object}->tee_on();
19264 $self->{_line_sink_object}->tee_off();
19267 sub get_output_line_number {
19269 return $self->{_output_line_number};
19272 sub decrement_output_line_number {
19274 $self->{_output_line_number}--;
19277 sub get_consecutive_nonblank_lines {
19279 return $self->{_consecutive_nonblank_lines};
19282 sub reset_consecutive_blank_lines {
19284 $self->{_consecutive_blank_lines} = 0;
19287 sub want_blank_line {
19289 unless ( $self->{_consecutive_blank_lines} ) {
19290 $self->write_blank_code_line();
19294 sub write_blank_code_line {
19296 my $rOpts = $self->{_rOpts};
19298 if ( $self->{_consecutive_blank_lines} >=
19299 $rOpts->{'maximum-consecutive-blank-lines'} );
19300 $self->{_consecutive_blank_lines}++;
19301 $self->{_consecutive_nonblank_lines} = 0;
19302 $self->write_line("\n");
19305 sub write_code_line {
19309 if ( $a =~ /^\s*$/ ) {
19310 my $rOpts = $self->{_rOpts};
19312 if ( $self->{_consecutive_blank_lines} >=
19313 $rOpts->{'maximum-consecutive-blank-lines'} );
19314 $self->{_consecutive_blank_lines}++;
19315 $self->{_consecutive_nonblank_lines} = 0;
19318 $self->{_consecutive_blank_lines} = 0;
19319 $self->{_consecutive_nonblank_lines}++;
19321 $self->write_line($a);
19328 # TODO: go through and see if the test is necessary here
19329 if ( $a =~ /\n$/ ) { $self->{_output_line_number}++; }
19331 $self->{_line_sink_object}->write_line($a);
19333 # This calculation of excess line length ignores any internal tabs
19334 my $rOpts = $self->{_rOpts};
19335 my $exceed = length($a) - $rOpts->{'maximum-line-length'} - 1;
19336 if ( $a =~ /^\t+/g ) {
19337 $exceed += pos($a) * ( $rOpts->{'indent-columns'} - 1 );
19340 # Note that we just incremented output line number to future value
19341 # so we must subtract 1 for current line number
19342 if ( length($a) > 1 + $self->{_max_output_line_length} ) {
19343 $self->{_max_output_line_length} = length($a) - 1;
19344 $self->{_max_output_line_length_at} = $self->{_output_line_number} - 1;
19347 if ( $exceed > 0 ) {
19348 my $output_line_number = $self->{_output_line_number};
19349 $self->{_last_line_length_error} = $exceed;
19350 $self->{_last_line_length_error_at} = $output_line_number - 1;
19351 if ( $self->{_line_length_error_count} == 0 ) {
19352 $self->{_first_line_length_error} = $exceed;
19353 $self->{_first_line_length_error_at} = $output_line_number - 1;
19357 $self->{_last_line_length_error} > $self->{_max_line_length_error} )
19359 $self->{_max_line_length_error} = $exceed;
19360 $self->{_max_line_length_error_at} = $output_line_number - 1;
19363 if ( $self->{_line_length_error_count} < MAX_NAG_MESSAGES ) {
19364 $self->write_logfile_entry(
19365 "Line length exceeded by $exceed characters\n");
19367 $self->{_line_length_error_count}++;
19372 sub report_line_length_errors {
19374 my $rOpts = $self->{_rOpts};
19375 my $line_length_error_count = $self->{_line_length_error_count};
19376 if ( $line_length_error_count == 0 ) {
19377 $self->write_logfile_entry(
19378 "No lines exceeded $rOpts->{'maximum-line-length'} characters\n");
19379 my $max_output_line_length = $self->{_max_output_line_length};
19380 my $max_output_line_length_at = $self->{_max_output_line_length_at};
19381 $self->write_logfile_entry(
19382 " Maximum output line length was $max_output_line_length at line $max_output_line_length_at\n"
19388 my $word = ( $line_length_error_count > 1 ) ? "s" : "";
19389 $self->write_logfile_entry(
19390 "$line_length_error_count output line$word exceeded $rOpts->{'maximum-line-length'} characters:\n"
19393 $word = ( $line_length_error_count > 1 ) ? "First" : "";
19394 my $first_line_length_error = $self->{_first_line_length_error};
19395 my $first_line_length_error_at = $self->{_first_line_length_error_at};
19396 $self->write_logfile_entry(
19397 " $word at line $first_line_length_error_at by $first_line_length_error characters\n"
19400 if ( $line_length_error_count > 1 ) {
19401 my $max_line_length_error = $self->{_max_line_length_error};
19402 my $max_line_length_error_at = $self->{_max_line_length_error_at};
19403 my $last_line_length_error = $self->{_last_line_length_error};
19404 my $last_line_length_error_at = $self->{_last_line_length_error_at};
19405 $self->write_logfile_entry(
19406 " Maximum at line $max_line_length_error_at by $max_line_length_error characters\n"
19408 $self->write_logfile_entry(
19409 " Last at line $last_line_length_error_at by $last_line_length_error characters\n"
19415 #####################################################################
19417 # The Perl::Tidy::Debugger class shows line tokenization
19419 #####################################################################
19421 package Perl::Tidy::Debugger;
19425 my ( $class, $filename ) = @_;
19428 _debug_file => $filename,
19429 _debug_file_opened => 0,
19434 sub really_open_debug_file {
19437 my $debug_file = $self->{_debug_file};
19439 unless ( $fh = IO::File->new("> $debug_file") ) {
19440 warn("can't open $debug_file: $!\n");
19442 $self->{_debug_file_opened} = 1;
19443 $self->{_fh} = $fh;
19445 "Use -dump-token-types (-dtt) to get a list of token type codes\n";
19448 sub close_debug_file {
19451 my $fh = $self->{_fh};
19452 if ( $self->{_debug_file_opened} ) {
19454 eval { $self->{_fh}->close() };
19458 sub write_debug_entry {
19460 # This is a debug dump routine which may be modified as necessary
19461 # to dump tokens on a line-by-line basis. The output will be written
19462 # to the .DEBUG file when the -D flag is entered.
19464 my $line_of_tokens = shift;
19466 my $input_line = $line_of_tokens->{_line_text};
19467 my $rtoken_type = $line_of_tokens->{_rtoken_type};
19468 my $rtokens = $line_of_tokens->{_rtokens};
19469 my $rlevels = $line_of_tokens->{_rlevels};
19470 my $rslevels = $line_of_tokens->{_rslevels};
19471 my $rblock_type = $line_of_tokens->{_rblock_type};
19472 my $input_line_number = $line_of_tokens->{_line_number};
19473 my $line_type = $line_of_tokens->{_line_type};
19477 my $token_str = "$input_line_number: ";
19478 my $reconstructed_original = "$input_line_number: ";
19479 my $block_str = "$input_line_number: ";
19481 #$token_str .= "$line_type: ";
19482 #$reconstructed_original .= "$line_type: ";
19485 my @next_char = ( '"', '"' );
19487 unless ( $self->{_debug_file_opened} ) { $self->really_open_debug_file() }
19488 my $fh = $self->{_fh};
19490 for ( $j = 0 ; $j < @$rtoken_type ; $j++ ) {
19493 if ( $$rtoken_type[$j] eq 'k' ) {
19494 $pattern .= $$rtokens[$j];
19497 $pattern .= $$rtoken_type[$j];
19499 $reconstructed_original .= $$rtokens[$j];
19500 $block_str .= "($$rblock_type[$j])";
19501 $num = length( $$rtokens[$j] );
19502 my $type_str = $$rtoken_type[$j];
19504 # be sure there are no blank tokens (shouldn't happen)
19505 # This can only happen if a programming error has been made
19506 # because all valid tokens are non-blank
19507 if ( $type_str eq ' ' ) {
19508 print $fh "BLANK TOKEN on the next line\n";
19509 $type_str = $next_char[$i_next];
19510 $i_next = 1 - $i_next;
19513 if ( length($type_str) == 1 ) {
19514 $type_str = $type_str x $num;
19516 $token_str .= $type_str;
19519 # Write what you want here ...
19520 # print $fh "$input_line\n";
19521 # print $fh "$pattern\n";
19522 print $fh "$reconstructed_original\n";
19523 print $fh "$token_str\n";
19525 #print $fh "$block_str\n";
19528 #####################################################################
19530 # The Perl::Tidy::LineBuffer class supplies a 'get_line()'
19531 # method for returning the next line to be parsed, as well as a
19532 # 'peek_ahead()' method
19534 # The input parameter is an object with a 'get_line()' method
19535 # which returns the next line to be parsed
19537 #####################################################################
19539 package Perl::Tidy::LineBuffer;
19544 my $line_source_object = shift;
19547 _line_source_object => $line_source_object,
19548 _rlookahead_buffer => [],
19554 my $buffer_index = shift;
19556 my $line_source_object = $self->{_line_source_object};
19557 my $rlookahead_buffer = $self->{_rlookahead_buffer};
19558 if ( $buffer_index < scalar(@$rlookahead_buffer) ) {
19559 $line = $$rlookahead_buffer[$buffer_index];
19562 $line = $line_source_object->get_line();
19563 push( @$rlookahead_buffer, $line );
19571 my $line_source_object = $self->{_line_source_object};
19572 my $rlookahead_buffer = $self->{_rlookahead_buffer};
19574 if ( scalar(@$rlookahead_buffer) ) {
19575 $line = shift @$rlookahead_buffer;
19578 $line = $line_source_object->get_line();
19583 ########################################################################
19585 # the Perl::Tidy::Tokenizer package is essentially a filter which
19586 # reads lines of perl source code from a source object and provides
19587 # corresponding tokenized lines through its get_line() method. Lines
19588 # flow from the source_object to the caller like this:
19590 # source_object --> LineBuffer_object --> Tokenizer --> calling routine
19591 # get_line() get_line() get_line() line_of_tokens
19593 # The source object can be any object with a get_line() method which
19594 # supplies one line (a character string) perl call.
19595 # The LineBuffer object is created by the Tokenizer.
19596 # The Tokenizer returns a reference to a data structure 'line_of_tokens'
19597 # containing one tokenized line for each call to its get_line() method.
19599 # WARNING: This is not a real class yet. Only one tokenizer my be used.
19601 ########################################################################
19603 package Perl::Tidy::Tokenizer;
19607 # Caution: these debug flags produce a lot of output
19608 # They should all be 0 except when debugging small scripts
19610 use constant TOKENIZER_DEBUG_FLAG_EXPECT => 0;
19611 use constant TOKENIZER_DEBUG_FLAG_NSCAN => 0;
19612 use constant TOKENIZER_DEBUG_FLAG_QUOTE => 0;
19613 use constant TOKENIZER_DEBUG_FLAG_SCAN_ID => 0;
19614 use constant TOKENIZER_DEBUG_FLAG_TOKENIZE => 0;
19616 my $debug_warning = sub {
19617 print "TOKENIZER_DEBUGGING with key $_[0]\n";
19620 TOKENIZER_DEBUG_FLAG_EXPECT && $debug_warning->('EXPECT');
19621 TOKENIZER_DEBUG_FLAG_NSCAN && $debug_warning->('NSCAN');
19622 TOKENIZER_DEBUG_FLAG_QUOTE && $debug_warning->('QUOTE');
19623 TOKENIZER_DEBUG_FLAG_SCAN_ID && $debug_warning->('SCAN_ID');
19624 TOKENIZER_DEBUG_FLAG_TOKENIZE && $debug_warning->('TOKENIZE');
19630 # PACKAGE VARIABLES for for processing an entire FILE.
19634 $last_nonblank_token
19635 $last_nonblank_type
19636 $last_nonblank_block_type
19644 %user_function_prototype
19646 %is_block_list_function
19647 %saw_function_definition
19651 $square_bracket_depth
19654 @nesting_sequence_number
19655 @current_sequence_number
19657 @paren_semicolon_count
19658 @paren_structural_type
19660 @brace_structural_type
19661 @brace_statement_type
19664 @square_bracket_type
19665 @square_bracket_structural_type
19667 @starting_line_of_current_depth
19670 # GLOBAL CONSTANTS for routines in this package
19672 %is_indirect_object_taker
19674 %expecting_operator_token
19675 %expecting_operator_types
19676 %expecting_term_types
19677 %expecting_term_token
19679 %is_file_test_operator
19681 %is_valid_token_type
19683 %is_code_block_token
19685 @opening_brace_names
19686 @closing_brace_names
19687 %is_keyword_taking_list
19688 %is_q_qq_qw_qx_qr_s_y_tr_m
19691 # possible values of operator_expected()
19692 use constant TERM => -1;
19693 use constant UNKNOWN => 0;
19694 use constant OPERATOR => 1;
19696 # possible values of context
19697 use constant SCALAR_CONTEXT => -1;
19698 use constant UNKNOWN_CONTEXT => 0;
19699 use constant LIST_CONTEXT => 1;
19701 # Maximum number of little messages; probably need not be changed.
19702 use constant MAX_NAG_MESSAGES => 6;
19706 # methods to count instances
19708 sub get_count { $_count; }
19709 sub _increment_count { ++$_count }
19710 sub _decrement_count { --$_count }
19714 $_[0]->_decrement_count();
19721 # Note: 'tabs' and 'indent_columns' are temporary and should be
19724 source_object => undef,
19725 debugger_object => undef,
19726 diagnostics_object => undef,
19727 logger_object => undef,
19728 starting_level => undef,
19729 indent_columns => 4,
19731 look_for_hash_bang => 0,
19733 look_for_autoloader => 1,
19734 look_for_selfloader => 1,
19735 starting_line_number => 1,
19737 my %args = ( %defaults, @_ );
19739 # we are given an object with a get_line() method to supply source lines
19740 my $source_object = $args{source_object};
19742 # we create another object with a get_line() and peek_ahead() method
19743 my $line_buffer_object = Perl::Tidy::LineBuffer->new($source_object);
19745 # Tokenizer state data is as follows:
19746 # _rhere_target_list reference to list of here-doc targets
19747 # _here_doc_target the target string for a here document
19748 # _here_quote_character the type of here-doc quoting (" ' ` or none)
19749 # to determine if interpolation is done
19750 # _quote_target character we seek if chasing a quote
19751 # _line_start_quote line where we started looking for a long quote
19752 # _in_here_doc flag indicating if we are in a here-doc
19753 # _in_pod flag set if we are in pod documentation
19754 # _in_error flag set if we saw severe error (binary in script)
19755 # _in_data flag set if we are in __DATA__ section
19756 # _in_end flag set if we are in __END__ section
19757 # _in_format flag set if we are in a format description
19758 # _in_attribute_list flag telling if we are looking for attributes
19759 # _in_quote flag telling if we are chasing a quote
19760 # _starting_level indentation level of first line
19761 # _input_tabstr string denoting one indentation level of input file
19762 # _know_input_tabstr flag indicating if we know _input_tabstr
19763 # _line_buffer_object object with get_line() method to supply source code
19764 # _diagnostics_object place to write debugging information
19765 # _unexpected_error_count error count used to limit output
19766 # _lower_case_labels_at line numbers where lower case labels seen
19767 $tokenizer_self = {
19768 _rhere_target_list => [],
19770 _here_doc_target => "",
19771 _here_quote_character => "",
19777 _in_attribute_list => 0,
19779 _quote_target => "",
19780 _line_start_quote => -1,
19781 _starting_level => $args{starting_level},
19782 _know_starting_level => defined( $args{starting_level} ),
19783 _tabs => $args{tabs},
19784 _indent_columns => $args{indent_columns},
19785 _look_for_hash_bang => $args{look_for_hash_bang},
19786 _trim_qw => $args{trim_qw},
19787 _input_tabstr => "",
19788 _know_input_tabstr => -1,
19789 _last_line_number => $args{starting_line_number} - 1,
19790 _saw_perl_dash_P => 0,
19791 _saw_perl_dash_w => 0,
19792 _saw_use_strict => 0,
19793 _saw_v_string => 0,
19794 _look_for_autoloader => $args{look_for_autoloader},
19795 _look_for_selfloader => $args{look_for_selfloader},
19796 _saw_autoloader => 0,
19797 _saw_selfloader => 0,
19798 _saw_hash_bang => 0,
19801 _saw_negative_indentation => 0,
19802 _started_tokenizing => 0,
19803 _line_buffer_object => $line_buffer_object,
19804 _debugger_object => $args{debugger_object},
19805 _diagnostics_object => $args{diagnostics_object},
19806 _logger_object => $args{logger_object},
19807 _unexpected_error_count => 0,
19808 _started_looking_for_here_target_at => 0,
19809 _nearly_matched_here_target_at => undef,
19811 _rlower_case_labels_at => undef,
19814 prepare_for_a_new_file();
19815 find_starting_indentation_level();
19817 bless $tokenizer_self, $class;
19819 # This is not a full class yet, so die if an attempt is made to
19820 # create more than one object.
19822 if ( _increment_count() > 1 ) {
19824 "Attempt to create more than 1 object in $class, which is not a true class yet\n";
19827 return $tokenizer_self;
19831 # interface to Perl::Tidy::Logger routines
19833 my $logger_object = $tokenizer_self->{_logger_object};
19834 if ($logger_object) {
19835 $logger_object->warning(@_);
19840 my $logger_object = $tokenizer_self->{_logger_object};
19841 if ($logger_object) {
19842 $logger_object->complain(@_);
19846 sub write_logfile_entry {
19847 my $logger_object = $tokenizer_self->{_logger_object};
19848 if ($logger_object) {
19849 $logger_object->write_logfile_entry(@_);
19853 sub interrupt_logfile {
19854 my $logger_object = $tokenizer_self->{_logger_object};
19855 if ($logger_object) {
19856 $logger_object->interrupt_logfile();
19860 sub resume_logfile {
19861 my $logger_object = $tokenizer_self->{_logger_object};
19862 if ($logger_object) {
19863 $logger_object->resume_logfile();
19867 sub increment_brace_error {
19868 my $logger_object = $tokenizer_self->{_logger_object};
19869 if ($logger_object) {
19870 $logger_object->increment_brace_error();
19874 sub report_definite_bug {
19875 my $logger_object = $tokenizer_self->{_logger_object};
19876 if ($logger_object) {
19877 $logger_object->report_definite_bug();
19881 sub brace_warning {
19882 my $logger_object = $tokenizer_self->{_logger_object};
19883 if ($logger_object) {
19884 $logger_object->brace_warning(@_);
19888 sub get_saw_brace_error {
19889 my $logger_object = $tokenizer_self->{_logger_object};
19890 if ($logger_object) {
19891 $logger_object->get_saw_brace_error();
19898 # interface to Perl::Tidy::Diagnostics routines
19899 sub write_diagnostics {
19900 if ( $tokenizer_self->{_diagnostics_object} ) {
19901 $tokenizer_self->{_diagnostics_object}->write_diagnostics(@_);
19905 sub report_tokenization_errors {
19909 my $level = get_indentation_level();
19910 if ( $level != $tokenizer_self->{_starting_level} ) {
19911 warning("final indentation level: $level\n");
19914 check_final_nesting_depths();
19916 if ( $tokenizer_self->{_look_for_hash_bang}
19917 && !$tokenizer_self->{_saw_hash_bang} )
19920 "hit EOF without seeing hash-bang line; maybe don't need -x?\n");
19923 if ( $tokenizer_self->{_in_format} ) {
19924 warning("hit EOF while in format description\n");
19927 if ( $tokenizer_self->{_in_pod} ) {
19929 # Just write log entry if this is after __END__ or __DATA__
19930 # because this happens to often, and it is not likely to be
19932 if ( $tokenizer_self->{_saw_data} || $tokenizer_self->{_saw_end} ) {
19933 write_logfile_entry(
19934 "hit eof while in pod documentation (no =cut seen)\n\tthis can cause trouble with some pod utilities\n"
19940 "hit eof while in pod documentation (no =cut seen)\n\tthis can cause trouble with some pod utilities\n"
19946 if ( $tokenizer_self->{_in_here_doc} ) {
19947 my $here_doc_target = $tokenizer_self->{_here_doc_target};
19948 my $started_looking_for_here_target_at =
19949 $tokenizer_self->{_started_looking_for_here_target_at};
19950 if ($here_doc_target) {
19952 "hit EOF in here document starting at line $started_looking_for_here_target_at with target: $here_doc_target\n"
19957 "hit EOF in here document starting at line $started_looking_for_here_target_at with empty target string\n"
19960 my $nearly_matched_here_target_at =
19961 $tokenizer_self->{_nearly_matched_here_target_at};
19962 if ($nearly_matched_here_target_at) {
19964 "NOTE: almost matched at input line $nearly_matched_here_target_at except for whitespace\n"
19969 if ( $tokenizer_self->{_in_quote} ) {
19970 my $line_start_quote = $tokenizer_self->{_line_start_quote};
19971 my $quote_target = $tokenizer_self->{_quote_target};
19973 ( $tokenizer_self->{_in_attribute_list} )
19977 "hit EOF seeking end of $what starting at line $line_start_quote ending in $quote_target\n"
19981 unless ( $tokenizer_self->{_saw_perl_dash_w} ) {
19982 if ( $] < 5.006 ) {
19983 write_logfile_entry("Suggest including '-w parameter'\n");
19986 write_logfile_entry("Suggest including 'use warnings;'\n");
19990 if ( $tokenizer_self->{_saw_perl_dash_P} ) {
19991 write_logfile_entry("Use of -P parameter for defines is discouraged\n");
19994 unless ( $tokenizer_self->{_saw_use_strict} ) {
19995 write_logfile_entry("Suggest including 'use strict;'\n");
19998 # it is suggested that lables have at least one upper case character
19999 # for legibility and to avoid code breakage as new keywords are introduced
20000 if ( $tokenizer_self->{_rlower_case_labels_at} ) {
20001 my @lower_case_labels_at =
20002 @{ $tokenizer_self->{_rlower_case_labels_at} };
20003 write_logfile_entry(
20004 "Suggest using upper case characters in label(s)\n");
20006 write_logfile_entry(" defined at line(s): (@lower_case_labels_at)\n");
20010 sub report_v_string {
20012 # warn if this version can't handle v-strings
20014 unless ( $tokenizer_self->{_saw_v_string} ) {
20015 $tokenizer_self->{_saw_v_string} = $tokenizer_self->{_last_line_number};
20017 if ( $] < 5.006 ) {
20019 "Found v-string '$tok' but v-strings are not implemented in your version of perl; see Camel 3 book ch 2\n"
20024 sub get_input_line_number {
20025 return $tokenizer_self->{_last_line_number};
20028 # returns the next tokenized line
20033 # USES GLOBAL VARIABLES: $tokenizer_self, $brace_depth,
20034 # $square_bracket_depth, $paren_depth
20036 my $input_line = $tokenizer_self->{_line_buffer_object}->get_line();
20037 $tokenizer_self->{_line_text} = $input_line;
20039 return undef unless ($input_line);
20041 my $input_line_number = ++$tokenizer_self->{_last_line_number};
20043 # Find and remove what characters terminate this line, including any
20045 my $input_line_separator = "";
20046 if ( chomp($input_line) ) { $input_line_separator = $/ }
20048 # TODO: what other characters should be included here?
20049 if ( $input_line =~ s/((\r|\035|\032)+)$// ) {
20050 $input_line_separator = $2 . $input_line_separator;
20053 # for backwards compatability we keep the line text terminated with
20054 # a newline character
20055 $input_line .= "\n";
20056 $tokenizer_self->{_line_text} = $input_line; # update
20058 # create a data structure describing this line which will be
20059 # returned to the caller.
20061 # _line_type codes are:
20062 # SYSTEM - system-specific code before hash-bang line
20063 # CODE - line of perl code (including comments)
20064 # POD_START - line starting pod, such as '=head'
20065 # POD - pod documentation text
20066 # POD_END - last line of pod section, '=cut'
20067 # HERE - text of here-document
20068 # HERE_END - last line of here-doc (target word)
20069 # FORMAT - format section
20070 # FORMAT_END - last line of format section, '.'
20071 # DATA_START - __DATA__ line
20072 # DATA - unidentified text following __DATA__
20073 # END_START - __END__ line
20074 # END - unidentified text following __END__
20075 # ERROR - we are in big trouble, probably not a perl script
20078 # _curly_brace_depth - depth of curly braces at start of line
20079 # _square_bracket_depth - depth of square brackets at start of line
20080 # _paren_depth - depth of parens at start of line
20081 # _starting_in_quote - this line continues a multi-line quote
20082 # (so don't trim leading blanks!)
20083 # _ending_in_quote - this line ends in a multi-line quote
20084 # (so don't trim trailing blanks!)
20085 my $line_of_tokens = {
20086 _line_type => 'EOF',
20087 _line_text => $input_line,
20088 _line_number => $input_line_number,
20089 _rtoken_type => undef,
20092 _rslevels => undef,
20093 _rblock_type => undef,
20094 _rcontainer_type => undef,
20095 _rcontainer_environment => undef,
20096 _rtype_sequence => undef,
20097 _rnesting_tokens => undef,
20098 _rci_levels => undef,
20099 _rnesting_blocks => undef,
20100 _python_indentation_level => -1, ## 0,
20101 _starting_in_quote => 0, # to be set by subroutine
20102 _ending_in_quote => 0,
20103 _curly_brace_depth => $brace_depth,
20104 _square_bracket_depth => $square_bracket_depth,
20105 _paren_depth => $paren_depth,
20106 _quote_character => '',
20109 # must print line unchanged if we are in a here document
20110 if ( $tokenizer_self->{_in_here_doc} ) {
20112 $line_of_tokens->{_line_type} = 'HERE';
20113 my $here_doc_target = $tokenizer_self->{_here_doc_target};
20114 my $here_quote_character = $tokenizer_self->{_here_quote_character};
20115 my $candidate_target = $input_line;
20116 chomp $candidate_target;
20117 if ( $candidate_target eq $here_doc_target ) {
20118 $tokenizer_self->{_nearly_matched_here_target_at} = undef;
20119 $line_of_tokens->{_line_type} = 'HERE_END';
20120 write_logfile_entry("Exiting HERE document $here_doc_target\n");
20122 my $rhere_target_list = $tokenizer_self->{_rhere_target_list};
20123 if (@$rhere_target_list) { # there can be multiple here targets
20124 ( $here_doc_target, $here_quote_character ) =
20125 @{ shift @$rhere_target_list };
20126 $tokenizer_self->{_here_doc_target} = $here_doc_target;
20127 $tokenizer_self->{_here_quote_character} =
20128 $here_quote_character;
20129 write_logfile_entry(
20130 "Entering HERE document $here_doc_target\n");
20131 $tokenizer_self->{_nearly_matched_here_target_at} = undef;
20132 $tokenizer_self->{_started_looking_for_here_target_at} =
20133 $input_line_number;
20136 $tokenizer_self->{_in_here_doc} = 0;
20137 $tokenizer_self->{_here_doc_target} = "";
20138 $tokenizer_self->{_here_quote_character} = "";
20142 # check for error of extra whitespace
20143 # note for PERL6: leading whitespace is allowed
20145 $candidate_target =~ s/\s*$//;
20146 $candidate_target =~ s/^\s*//;
20147 if ( $candidate_target eq $here_doc_target ) {
20148 $tokenizer_self->{_nearly_matched_here_target_at} =
20149 $input_line_number;
20152 return $line_of_tokens;
20155 # must print line unchanged if we are in a format section
20156 elsif ( $tokenizer_self->{_in_format} ) {
20158 if ( $input_line =~ /^\.[\s#]*$/ ) {
20159 write_logfile_entry("Exiting format section\n");
20160 $tokenizer_self->{_in_format} = 0;
20161 $line_of_tokens->{_line_type} = 'FORMAT_END';
20164 $line_of_tokens->{_line_type} = 'FORMAT';
20166 return $line_of_tokens;
20169 # must print line unchanged if we are in pod documentation
20170 elsif ( $tokenizer_self->{_in_pod} ) {
20172 $line_of_tokens->{_line_type} = 'POD';
20173 if ( $input_line =~ /^=cut/ ) {
20174 $line_of_tokens->{_line_type} = 'POD_END';
20175 write_logfile_entry("Exiting POD section\n");
20176 $tokenizer_self->{_in_pod} = 0;
20178 if ( $input_line =~ /^\#\!.*perl\b/ ) {
20180 "Hash-bang in pod can cause older versions of perl to fail! \n"
20184 return $line_of_tokens;
20187 # must print line unchanged if we have seen a severe error (i.e., we
20188 # are seeing illegal tokens and connot continue. Syntax errors do
20189 # not pass this route). Calling routine can decide what to do, but
20190 # the default can be to just pass all lines as if they were after __END__
20191 elsif ( $tokenizer_self->{_in_error} ) {
20192 $line_of_tokens->{_line_type} = 'ERROR';
20193 return $line_of_tokens;
20196 # print line unchanged if we are __DATA__ section
20197 elsif ( $tokenizer_self->{_in_data} ) {
20199 # ...but look for POD
20200 # Note that the _in_data and _in_end flags remain set
20201 # so that we return to that state after seeing the
20202 # end of a pod section
20203 if ( $input_line =~ /^=(?!cut)/ ) {
20204 $line_of_tokens->{_line_type} = 'POD_START';
20205 write_logfile_entry("Entering POD section\n");
20206 $tokenizer_self->{_in_pod} = 1;
20207 return $line_of_tokens;
20210 $line_of_tokens->{_line_type} = 'DATA';
20211 return $line_of_tokens;
20215 # print line unchanged if we are in __END__ section
20216 elsif ( $tokenizer_self->{_in_end} ) {
20218 # ...but look for POD
20219 # Note that the _in_data and _in_end flags remain set
20220 # so that we return to that state after seeing the
20221 # end of a pod section
20222 if ( $input_line =~ /^=(?!cut)/ ) {
20223 $line_of_tokens->{_line_type} = 'POD_START';
20224 write_logfile_entry("Entering POD section\n");
20225 $tokenizer_self->{_in_pod} = 1;
20226 return $line_of_tokens;
20229 $line_of_tokens->{_line_type} = 'END';
20230 return $line_of_tokens;
20234 # check for a hash-bang line if we haven't seen one
20235 if ( !$tokenizer_self->{_saw_hash_bang} ) {
20236 if ( $input_line =~ /^\#\!.*perl\b/ ) {
20237 $tokenizer_self->{_saw_hash_bang} = $input_line_number;
20239 # check for -w and -P flags
20240 if ( $input_line =~ /^\#\!.*perl\s.*-.*P/ ) {
20241 $tokenizer_self->{_saw_perl_dash_P} = 1;
20244 if ( $input_line =~ /^\#\!.*perl\s.*-.*w/ ) {
20245 $tokenizer_self->{_saw_perl_dash_w} = 1;
20248 if ( ( $input_line_number > 1 )
20249 && ( !$tokenizer_self->{_look_for_hash_bang} ) )
20252 # this is helpful for VMS systems; we may have accidentally
20253 # tokenized some DCL commands
20254 if ( $tokenizer_self->{_started_tokenizing} ) {
20256 "There seems to be a hash-bang after line 1; do you need to run with -x ?\n"
20260 complain("Useless hash-bang after line 1\n");
20264 # Report the leading hash-bang as a system line
20265 # This will prevent -dac from deleting it
20267 $line_of_tokens->{_line_type} = 'SYSTEM';
20268 return $line_of_tokens;
20273 # wait for a hash-bang before parsing if the user invoked us with -x
20274 if ( $tokenizer_self->{_look_for_hash_bang}
20275 && !$tokenizer_self->{_saw_hash_bang} )
20277 $line_of_tokens->{_line_type} = 'SYSTEM';
20278 return $line_of_tokens;
20281 # a first line of the form ': #' will be marked as SYSTEM
20282 # since lines of this form may be used by tcsh
20283 if ( $input_line_number == 1 && $input_line =~ /^\s*\:\s*\#/ ) {
20284 $line_of_tokens->{_line_type} = 'SYSTEM';
20285 return $line_of_tokens;
20288 # now we know that it is ok to tokenize the line...
20289 # the line tokenizer will modify any of these private variables:
20290 # _rhere_target_list
20297 my $ending_in_quote_last = $tokenizer_self->{_in_quote};
20298 tokenize_this_line($line_of_tokens);
20300 # Now finish defining the return structure and return it
20301 $line_of_tokens->{_ending_in_quote} = $tokenizer_self->{_in_quote};
20303 # handle severe error (binary data in script)
20304 if ( $tokenizer_self->{_in_error} ) {
20305 $tokenizer_self->{_in_quote} = 0; # to avoid any more messages
20306 warning("Giving up after error\n");
20307 $line_of_tokens->{_line_type} = 'ERROR';
20308 reset_indentation_level(0); # avoid error messages
20309 return $line_of_tokens;
20312 # handle start of pod documentation
20313 if ( $tokenizer_self->{_in_pod} ) {
20315 # This gets tricky..above a __DATA__ or __END__ section, perl
20316 # accepts '=cut' as the start of pod section. But afterwards,
20317 # only pod utilities see it and they may ignore an =cut without
20318 # leading =head. In any case, this isn't good.
20319 if ( $input_line =~ /^=cut\b/ ) {
20320 if ( $tokenizer_self->{_saw_data} || $tokenizer_self->{_saw_end} ) {
20321 complain("=cut while not in pod ignored\n");
20322 $tokenizer_self->{_in_pod} = 0;
20323 $line_of_tokens->{_line_type} = 'POD_STOP';
20326 $line_of_tokens->{_line_type} = 'POD_END';
20328 "=cut starts a pod section .. this can fool pod utilities.\n"
20330 write_logfile_entry("Entering POD section\n");
20335 $line_of_tokens->{_line_type} = 'POD_START';
20336 write_logfile_entry("Entering POD section\n");
20339 return $line_of_tokens;
20342 # update indentation levels for log messages
20343 if ( $input_line !~ /^\s*$/ ) {
20344 my $rlevels = $line_of_tokens->{_rlevels};
20345 my $structural_indentation_level = $$rlevels[0];
20346 my ( $python_indentation_level, $msg ) =
20347 find_indentation_level( $input_line, $structural_indentation_level );
20348 if ($msg) { write_logfile_entry("$msg") }
20349 if ( $tokenizer_self->{_know_input_tabstr} == 1 ) {
20350 $line_of_tokens->{_python_indentation_level} =
20351 $python_indentation_level;
20355 # see if this line contains here doc targets
20356 my $rhere_target_list = $tokenizer_self->{_rhere_target_list};
20357 if (@$rhere_target_list) {
20359 my ( $here_doc_target, $here_quote_character ) =
20360 @{ shift @$rhere_target_list };
20361 $tokenizer_self->{_in_here_doc} = 1;
20362 $tokenizer_self->{_here_doc_target} = $here_doc_target;
20363 $tokenizer_self->{_here_quote_character} = $here_quote_character;
20364 write_logfile_entry("Entering HERE document $here_doc_target\n");
20365 $tokenizer_self->{_started_looking_for_here_target_at} =
20366 $input_line_number;
20369 # NOTE: __END__ and __DATA__ statements are written unformatted
20370 # because they can theoretically contain additional characters
20371 # which are not tokenized (and cannot be read with <DATA> either!).
20372 if ( $tokenizer_self->{_in_data} ) {
20373 $line_of_tokens->{_line_type} = 'DATA_START';
20374 write_logfile_entry("Starting __DATA__ section\n");
20375 $tokenizer_self->{_saw_data} = 1;
20377 # keep parsing after __DATA__ if use SelfLoader was seen
20378 if ( $tokenizer_self->{_saw_selfloader} ) {
20379 $tokenizer_self->{_in_data} = 0;
20380 write_logfile_entry(
20381 "SelfLoader seen, continuing; -nlsl deactivates\n");
20384 return $line_of_tokens;
20387 elsif ( $tokenizer_self->{_in_end} ) {
20388 $line_of_tokens->{_line_type} = 'END_START';
20389 write_logfile_entry("Starting __END__ section\n");
20390 $tokenizer_self->{_saw_end} = 1;
20392 # keep parsing after __END__ if use AutoLoader was seen
20393 if ( $tokenizer_self->{_saw_autoloader} ) {
20394 $tokenizer_self->{_in_end} = 0;
20395 write_logfile_entry(
20396 "AutoLoader seen, continuing; -nlal deactivates\n");
20398 return $line_of_tokens;
20401 # now, finally, we know that this line is type 'CODE'
20402 $line_of_tokens->{_line_type} = 'CODE';
20404 # remember if we have seen any real code
20405 if ( !$tokenizer_self->{_started_tokenizing}
20406 && $input_line !~ /^\s*$/
20407 && $input_line !~ /^\s*#/ )
20409 $tokenizer_self->{_started_tokenizing} = 1;
20412 if ( $tokenizer_self->{_debugger_object} ) {
20413 $tokenizer_self->{_debugger_object}->write_debug_entry($line_of_tokens);
20416 # Note: if keyword 'format' occurs in this line code, it is still CODE
20417 # (keyword 'format' need not start a line)
20418 if ( $tokenizer_self->{_in_format} ) {
20419 write_logfile_entry("Entering format section\n");
20422 if ( $tokenizer_self->{_in_quote}
20423 and ( $tokenizer_self->{_line_start_quote} < 0 ) )
20426 #if ( ( my $quote_target = get_quote_target() ) !~ /^\s*$/ ) {
20428 ( my $quote_target = $tokenizer_self->{_quote_target} ) !~ /^\s*$/ )
20430 $tokenizer_self->{_line_start_quote} = $input_line_number;
20431 write_logfile_entry(
20432 "Start multi-line quote or pattern ending in $quote_target\n");
20435 elsif ( ( $tokenizer_self->{_line_start_quote} >= 0 )
20436 and !$tokenizer_self->{_in_quote} )
20438 $tokenizer_self->{_line_start_quote} = -1;
20439 write_logfile_entry("End of multi-line quote or pattern\n");
20442 # we are returning a line of CODE
20443 return $line_of_tokens;
20446 sub find_starting_indentation_level {
20448 # USES GLOBAL VARIABLES: $tokenizer_self
20449 my $starting_level = 0;
20450 my $know_input_tabstr = -1; # flag for find_indentation_level
20452 # use value if given as parameter
20453 if ( $tokenizer_self->{_know_starting_level} ) {
20454 $starting_level = $tokenizer_self->{_starting_level};
20457 # if we know there is a hash_bang line, the level must be zero
20458 elsif ( $tokenizer_self->{_look_for_hash_bang} ) {
20459 $tokenizer_self->{_know_starting_level} = 1;
20462 # otherwise figure it out from the input file
20466 my $structural_indentation_level = -1; # flag for find_indentation_level
20470 $tokenizer_self->{_line_buffer_object}->peek_ahead( $i++ ) )
20473 # if first line is #! then assume starting level is zero
20474 if ( $i == 1 && $line =~ /^\#\!/ ) {
20475 $starting_level = 0;
20478 next if ( $line =~ /^\s*#/ ); # must not be comment
20479 next if ( $line =~ /^\s*$/ ); # must not be blank
20480 ( $starting_level, $msg ) =
20481 find_indentation_level( $line, $structural_indentation_level );
20482 if ($msg) { write_logfile_entry("$msg") }
20485 $msg = "Line $i implies starting-indentation-level = $starting_level\n";
20487 if ( $starting_level > 0 ) {
20489 my $input_tabstr = $tokenizer_self->{_input_tabstr};
20490 if ( $input_tabstr eq "\t" ) {
20491 $msg .= "by guessing input tabbing uses 1 tab per level\n";
20494 my $cols = length($input_tabstr);
20496 "by guessing input tabbing uses $cols blanks per level\n";
20499 write_logfile_entry("$msg");
20501 $tokenizer_self->{_starting_level} = $starting_level;
20502 reset_indentation_level($starting_level);
20505 # Find indentation level given a input line. At the same time, try to
20506 # figure out the input tabbing scheme.
20508 # There are two types of calls:
20510 # Type 1: $structural_indentation_level < 0
20511 # In this case we have to guess $input_tabstr to figure out the level.
20513 # Type 2: $structural_indentation_level >= 0
20514 # In this case the level of this line is known, and this routine can
20515 # update the tabbing string, if still unknown, to make the level correct.
20517 sub find_indentation_level {
20518 my ( $line, $structural_indentation_level ) = @_;
20520 # USES GLOBAL VARIABLES: $tokenizer_self
20524 my $know_input_tabstr = $tokenizer_self->{_know_input_tabstr};
20525 my $input_tabstr = $tokenizer_self->{_input_tabstr};
20527 # find leading whitespace
20528 my $leading_whitespace = ( $line =~ /^(\s*)/ ) ? $1 : "";
20530 # make first guess at input tabbing scheme if necessary
20531 if ( $know_input_tabstr < 0 ) {
20533 $know_input_tabstr = 0;
20535 if ( $tokenizer_self->{_tabs} ) {
20536 $input_tabstr = "\t";
20537 if ( length($leading_whitespace) > 0 ) {
20538 if ( $leading_whitespace !~ /\t/ ) {
20540 my $cols = $tokenizer_self->{_indent_columns};
20542 if ( length($leading_whitespace) < $cols ) {
20543 $cols = length($leading_whitespace);
20545 $input_tabstr = " " x $cols;
20550 $input_tabstr = " " x $tokenizer_self->{_indent_columns};
20552 if ( length($leading_whitespace) > 0 ) {
20553 if ( $leading_whitespace =~ /^\t/ ) {
20554 $input_tabstr = "\t";
20558 $tokenizer_self->{_know_input_tabstr} = $know_input_tabstr;
20559 $tokenizer_self->{_input_tabstr} = $input_tabstr;
20562 # determine the input tabbing scheme if possible
20563 if ( ( $know_input_tabstr == 0 )
20564 && ( length($leading_whitespace) > 0 )
20565 && ( $structural_indentation_level > 0 ) )
20567 my $saved_input_tabstr = $input_tabstr;
20569 # check for common case of one tab per indentation level
20570 if ( $leading_whitespace eq "\t" x $structural_indentation_level ) {
20571 if ( $leading_whitespace eq "\t" x $structural_indentation_level ) {
20572 $input_tabstr = "\t";
20573 $msg = "Guessing old indentation was tab character\n";
20579 # detab any tabs based on 8 blanks per tab
20581 if ( $leading_whitespace =~ s/^\t+/ /g ) {
20582 $entabbed = "entabbed";
20585 # now compute tabbing from number of spaces
20587 length($leading_whitespace) / $structural_indentation_level;
20588 if ( $columns == int $columns ) {
20590 "Guessing old indentation was $columns $entabbed spaces\n";
20593 $columns = int $columns;
20595 "old indentation is unclear, using $columns $entabbed spaces\n";
20597 $input_tabstr = " " x $columns;
20599 $know_input_tabstr = 1;
20600 $tokenizer_self->{_know_input_tabstr} = $know_input_tabstr;
20601 $tokenizer_self->{_input_tabstr} = $input_tabstr;
20603 # see if mistakes were made
20604 if ( ( $tokenizer_self->{_starting_level} > 0 )
20605 && !$tokenizer_self->{_know_starting_level} )
20608 if ( $input_tabstr ne $saved_input_tabstr ) {
20610 "I made a bad starting level guess; rerun with a value for -sil \n"
20616 # use current guess at input tabbing to get input indentation level
20618 # Patch to handle a common case of entabbed leading whitespace
20619 # If the leading whitespace equals 4 spaces and we also have
20620 # tabs, detab the input whitespace assuming 8 spaces per tab.
20621 if ( length($input_tabstr) == 4 ) {
20622 $leading_whitespace =~ s/^\t+/ /g;
20625 if ( ( my $len_tab = length($input_tabstr) ) > 0 ) {
20628 while ( substr( $leading_whitespace, $pos, $len_tab ) eq $input_tabstr )
20634 return ( $level, $msg );
20637 # This is a currently unused debug routine
20638 sub dump_functions {
20642 foreach $pkg ( keys %is_user_function ) {
20643 print $fh "\nnon-constant subs in package $pkg\n";
20645 foreach $sub ( keys %{ $is_user_function{$pkg} } ) {
20647 if ( $is_block_list_function{$pkg}{$sub} ) {
20648 $msg = 'block_list';
20651 if ( $is_block_function{$pkg}{$sub} ) {
20654 print $fh "$sub $msg\n";
20658 foreach $pkg ( keys %is_constant ) {
20659 print $fh "\nconstants and constant subs in package $pkg\n";
20661 foreach $sub ( keys %{ $is_constant{$pkg} } ) {
20662 print $fh "$sub\n";
20669 # count number of 1's in a string of 1's and 0's
20670 # example: ones_count("010101010101") gives 6
20671 return ( my $cis = $_[0] ) =~ tr/1/0/;
20674 sub prepare_for_a_new_file {
20676 # previous tokens needed to determine what to expect next
20677 $last_nonblank_token = ';'; # the only possible starting state which
20678 $last_nonblank_type = ';'; # will make a leading brace a code block
20679 $last_nonblank_block_type = '';
20681 # scalars for remembering statement types across multiple lines
20682 $statement_type = ''; # '' or 'use' or 'sub..' or 'case..'
20683 $in_attribute_list = 0;
20685 # scalars for remembering where we are in the file
20686 $current_package = "main";
20687 $context = UNKNOWN_CONTEXT;
20689 # hashes used to remember function information
20690 %is_constant = (); # user-defined constants
20691 %is_user_function = (); # user-defined functions
20692 %user_function_prototype = (); # their prototypes
20693 %is_block_function = ();
20694 %is_block_list_function = ();
20695 %saw_function_definition = ();
20697 # variables used to track depths of various containers
20698 # and report nesting errors
20701 $square_bracket_depth = 0;
20702 @current_depth[ 0 .. $#closing_brace_names ] =
20703 (0) x scalar @closing_brace_names;
20704 @nesting_sequence_number[ 0 .. $#closing_brace_names ] =
20705 ( 0 .. $#closing_brace_names );
20706 @current_sequence_number = ();
20707 $paren_type[$paren_depth] = '';
20708 $paren_semicolon_count[$paren_depth] = 0;
20709 $paren_structural_type[$brace_depth] = '';
20710 $brace_type[$brace_depth] = ';'; # identify opening brace as code block
20711 $brace_structural_type[$brace_depth] = '';
20712 $brace_statement_type[$brace_depth] = "";
20713 $brace_context[$brace_depth] = UNKNOWN_CONTEXT;
20714 $brace_package[$paren_depth] = $current_package;
20715 $square_bracket_type[$square_bracket_depth] = '';
20716 $square_bracket_structural_type[$square_bracket_depth] = '';
20718 initialize_tokenizer_state();
20721 { # begin tokenize_this_line
20723 use constant BRACE => 0;
20724 use constant SQUARE_BRACKET => 1;
20725 use constant PAREN => 2;
20726 use constant QUESTION_COLON => 3;
20728 # TV1: scalars for processing one LINE.
20729 # Re-initialized on each entry to sub tokenize_this_line.
20731 $block_type, $container_type, $expecting,
20732 $i, $i_tok, $input_line,
20733 $input_line_number, $last_nonblank_i, $max_token_index,
20734 $next_tok, $next_type, $peeked_ahead,
20735 $prototype, $rhere_target_list, $rtoken_map,
20736 $rtoken_type, $rtokens, $tok,
20737 $type, $type_sequence,
20740 # TV2: refs to ARRAYS for processing one LINE
20741 # Re-initialized on each call.
20742 my $routput_token_list = []; # stack of output token indexes
20743 my $routput_token_type = []; # token types
20744 my $routput_block_type = []; # types of code block
20745 my $routput_container_type = []; # paren types, such as if, elsif, ..
20746 my $routput_type_sequence = []; # nesting sequential number
20748 # TV3: SCALARS for quote variables. These are initialized with a
20749 # subroutine call and continually updated as lines are processed.
20750 my ( $in_quote, $quote_type, $quote_character, $quote_pos, $quote_depth,
20751 $quoted_string_1, $quoted_string_2, $allowed_quote_modifiers, );
20753 # TV4: SCALARS for multi-line identifiers and
20754 # statements. These are initialized with a subroutine call
20755 # and continually updated as lines are processed.
20756 my ( $id_scan_state, $identifier, $want_paren, );
20758 # TV5: SCALARS for tracking indentation level.
20759 # Initialized once and continually updated as lines are
20762 $nesting_token_string, $nesting_type_string,
20763 $nesting_block_string, $nesting_block_flag,
20764 $nesting_list_string, $nesting_list_flag,
20765 $ci_string_in_tokenizer, $continuation_string_in_tokenizer,
20766 $in_statement_continuation, $level_in_tokenizer,
20767 $slevel_in_tokenizer, $rslevel_stack,
20770 # TV6: SCALARS for remembering several previous
20771 # tokens. Initialized once and continually updated as
20772 # lines are processed.
20774 $last_nonblank_container_type, $last_nonblank_type_sequence,
20775 $last_last_nonblank_token, $last_last_nonblank_type,
20776 $last_last_nonblank_block_type, $last_last_nonblank_container_type,
20777 $last_last_nonblank_type_sequence, $last_nonblank_prototype,
20780 # ----------------------------------------------------------------
20781 # beginning of tokenizer variable access and manipulation routines
20782 # ----------------------------------------------------------------
20784 sub initialize_tokenizer_state {
20786 # TV1: initialized on each call
20787 # TV2: initialized on each call
20791 $quote_character = "";
20794 $quoted_string_1 = "";
20795 $quoted_string_2 = "";
20796 $allowed_quote_modifiers = "";
20799 $id_scan_state = '';
20804 $nesting_token_string = "";
20805 $nesting_type_string = "";
20806 $nesting_block_string = '1'; # initially in a block
20807 $nesting_block_flag = 1;
20808 $nesting_list_string = '0'; # initially not in a list
20809 $nesting_list_flag = 0; # initially not in a list
20810 $ci_string_in_tokenizer = "";
20811 $continuation_string_in_tokenizer = "0";
20812 $in_statement_continuation = 0;
20813 $level_in_tokenizer = 0;
20814 $slevel_in_tokenizer = 0;
20815 $rslevel_stack = [];
20818 $last_nonblank_container_type = '';
20819 $last_nonblank_type_sequence = '';
20820 $last_last_nonblank_token = ';';
20821 $last_last_nonblank_type = ';';
20822 $last_last_nonblank_block_type = '';
20823 $last_last_nonblank_container_type = '';
20824 $last_last_nonblank_type_sequence = '';
20825 $last_nonblank_prototype = "";
20828 sub save_tokenizer_state {
20831 $block_type, $container_type, $expecting,
20832 $i, $i_tok, $input_line,
20833 $input_line_number, $last_nonblank_i, $max_token_index,
20834 $next_tok, $next_type, $peeked_ahead,
20835 $prototype, $rhere_target_list, $rtoken_map,
20836 $rtoken_type, $rtokens, $tok,
20837 $type, $type_sequence,
20841 $routput_token_list, $routput_token_type,
20842 $routput_block_type, $routput_container_type,
20843 $routput_type_sequence,
20847 $in_quote, $quote_type,
20848 $quote_character, $quote_pos,
20849 $quote_depth, $quoted_string_1,
20850 $quoted_string_2, $allowed_quote_modifiers,
20853 my $rTV4 = [ $id_scan_state, $identifier, $want_paren, ];
20856 $nesting_token_string, $nesting_type_string,
20857 $nesting_block_string, $nesting_block_flag,
20858 $nesting_list_string, $nesting_list_flag,
20859 $ci_string_in_tokenizer, $continuation_string_in_tokenizer,
20860 $in_statement_continuation, $level_in_tokenizer,
20861 $slevel_in_tokenizer, $rslevel_stack,
20865 $last_nonblank_container_type,
20866 $last_nonblank_type_sequence,
20867 $last_last_nonblank_token,
20868 $last_last_nonblank_type,
20869 $last_last_nonblank_block_type,
20870 $last_last_nonblank_container_type,
20871 $last_last_nonblank_type_sequence,
20872 $last_nonblank_prototype,
20874 return [ $rTV1, $rTV2, $rTV3, $rTV4, $rTV5, $rTV6 ];
20877 sub restore_tokenizer_state {
20879 my ( $rTV1, $rTV2, $rTV3, $rTV4, $rTV5, $rTV6 ) = @{$rstate};
20881 $block_type, $container_type, $expecting,
20882 $i, $i_tok, $input_line,
20883 $input_line_number, $last_nonblank_i, $max_token_index,
20884 $next_tok, $next_type, $peeked_ahead,
20885 $prototype, $rhere_target_list, $rtoken_map,
20886 $rtoken_type, $rtokens, $tok,
20887 $type, $type_sequence,
20891 $routput_token_list, $routput_token_type,
20892 $routput_block_type, $routput_container_type,
20893 $routput_type_sequence,
20897 $in_quote, $quote_type, $quote_character, $quote_pos, $quote_depth,
20898 $quoted_string_1, $quoted_string_2, $allowed_quote_modifiers,
20901 ( $id_scan_state, $identifier, $want_paren, ) = @{$rTV4};
20904 $nesting_token_string, $nesting_type_string,
20905 $nesting_block_string, $nesting_block_flag,
20906 $nesting_list_string, $nesting_list_flag,
20907 $ci_string_in_tokenizer, $continuation_string_in_tokenizer,
20908 $in_statement_continuation, $level_in_tokenizer,
20909 $slevel_in_tokenizer, $rslevel_stack,
20913 $last_nonblank_container_type,
20914 $last_nonblank_type_sequence,
20915 $last_last_nonblank_token,
20916 $last_last_nonblank_type,
20917 $last_last_nonblank_block_type,
20918 $last_last_nonblank_container_type,
20919 $last_last_nonblank_type_sequence,
20920 $last_nonblank_prototype,
20924 sub get_indentation_level {
20925 return $level_in_tokenizer;
20928 sub reset_indentation_level {
20929 $level_in_tokenizer = $_[0];
20930 $slevel_in_tokenizer = $_[0];
20931 push @{$rslevel_stack}, $slevel_in_tokenizer;
20935 $peeked_ahead = defined( $_[0] ) ? $_[0] : $peeked_ahead;
20938 # ------------------------------------------------------------
20939 # end of tokenizer variable access and manipulation routines
20940 # ------------------------------------------------------------
20942 # ------------------------------------------------------------
20943 # beginning of various scanner interface routines
20944 # ------------------------------------------------------------
20945 sub scan_replacement_text {
20947 # check for here-docs in replacement text invoked by
20948 # a substitution operator with executable modifier 'e'.
20951 # $replacement_text
20953 # $rht = reference to any here-doc targets
20954 my ($replacement_text) = @_;
20957 return undef unless ( $replacement_text =~ /<</ );
20959 write_logfile_entry("scanning replacement text for here-doc targets\n");
20961 # save the logger object for error messages
20962 my $logger_object = $tokenizer_self->{_logger_object};
20964 # localize all package variables
20966 $tokenizer_self, $last_nonblank_token,
20967 $last_nonblank_type, $last_nonblank_block_type,
20968 $statement_type, $in_attribute_list,
20969 $current_package, $context,
20970 %is_constant, %is_user_function,
20971 %user_function_prototype, %is_block_function,
20972 %is_block_list_function, %saw_function_definition,
20973 $brace_depth, $paren_depth,
20974 $square_bracket_depth, @current_depth,
20975 @nesting_sequence_number, @current_sequence_number,
20976 @paren_type, @paren_semicolon_count,
20977 @paren_structural_type, @brace_type,
20978 @brace_structural_type, @brace_statement_type,
20979 @brace_context, @brace_package,
20980 @square_bracket_type, @square_bracket_structural_type,
20981 @depth_array, @starting_line_of_current_depth,
20984 # save all lexical variables
20985 my $rstate = save_tokenizer_state();
20986 _decrement_count(); # avoid error check for multiple tokenizers
20988 # make a new tokenizer
20990 my $rpending_logfile_message;
20991 my $source_object =
20992 Perl::Tidy::LineSource->new( \$replacement_text, $rOpts,
20993 $rpending_logfile_message );
20994 my $tokenizer = Perl::Tidy::Tokenizer->new(
20995 source_object => $source_object,
20996 logger_object => $logger_object,
20997 starting_line_number => $input_line_number,
21000 # scan the replacement text
21001 1 while ( $tokenizer->get_line() );
21003 # remove any here doc targets
21005 if ( $tokenizer_self->{_in_here_doc} ) {
21009 $tokenizer_self->{_here_doc_target},
21010 $tokenizer_self->{_here_quote_character}
21012 if ( $tokenizer_self->{_rhere_target_list} ) {
21013 push @{$rht}, @{ $tokenizer_self->{_rhere_target_list} };
21014 $tokenizer_self->{_rhere_target_list} = undef;
21016 $tokenizer_self->{_in_here_doc} = undef;
21019 # now its safe to report errors
21020 $tokenizer->report_tokenization_errors();
21022 # restore all tokenizer lexical variables
21023 restore_tokenizer_state($rstate);
21025 # return the here doc targets
21029 sub scan_bare_identifier {
21030 ( $i, $tok, $type, $prototype ) =
21031 scan_bare_identifier_do( $input_line, $i, $tok, $type, $prototype,
21032 $rtoken_map, $max_token_index );
21035 sub scan_identifier {
21036 ( $i, $tok, $type, $id_scan_state, $identifier ) =
21037 scan_identifier_do( $i, $id_scan_state, $identifier, $rtokens,
21038 $max_token_index );
21042 ( $i, $tok, $type, $id_scan_state ) =
21043 scan_id_do( $input_line, $i, $tok, $rtokens, $rtoken_map,
21044 $id_scan_state, $max_token_index );
21049 ( $i, $type, $number ) =
21050 scan_number_do( $input_line, $i, $rtoken_map, $type,
21051 $max_token_index );
21055 # a sub to warn if token found where term expected
21056 sub error_if_expecting_TERM {
21057 if ( $expecting == TERM ) {
21058 if ( $really_want_term{$last_nonblank_type} ) {
21059 unexpected( $tok, "term", $i_tok, $last_nonblank_i, $rtoken_map,
21060 $rtoken_type, $input_line );
21066 # a sub to warn if token found where operator expected
21067 sub error_if_expecting_OPERATOR {
21068 if ( $expecting == OPERATOR ) {
21069 my $thing = defined $_[0] ? $_[0] : $tok;
21070 unexpected( $thing, "operator", $i_tok, $last_nonblank_i,
21071 $rtoken_map, $rtoken_type, $input_line );
21072 if ( $i_tok == 0 ) {
21073 interrupt_logfile();
21074 warning("Missing ';' above?\n");
21081 # ------------------------------------------------------------
21082 # end scanner interfaces
21083 # ------------------------------------------------------------
21085 my %is_for_foreach;
21086 @_ = qw(for foreach);
21087 @is_for_foreach{@_} = (1) x scalar(@_);
21091 @is_my_our{@_} = (1) x scalar(@_);
21093 # These keywords may introduce blocks after parenthesized expressions,
21095 # keyword ( .... ) { BLOCK }
21096 # patch for SWITCH/CASE: added 'switch' 'case' 'given' 'when'
21097 my %is_blocktype_with_paren;
21098 @_ = qw(if elsif unless while until for foreach switch case given when);
21099 @is_blocktype_with_paren{@_} = (1) x scalar(@_);
21101 # ------------------------------------------------------------
21102 # begin hash of code for handling most token types
21103 # ------------------------------------------------------------
21104 my $tokenization_code = {
21106 # no special code for these types yet, but syntax checks
21141 error_if_expecting_TERM()
21142 if ( $expecting == TERM );
21145 error_if_expecting_TERM()
21146 if ( $expecting == TERM );
21150 # start looking for a scalar
21151 error_if_expecting_OPERATOR("Scalar")
21152 if ( $expecting == OPERATOR );
21155 if ( $identifier eq '$^W' ) {
21156 $tokenizer_self->{_saw_perl_dash_w} = 1;
21159 # Check for indentifier in indirect object slot
21160 # (vorboard.pl, sort.t). Something like:
21161 # /^(print|printf|sort|exec|system)$/
21163 $is_indirect_object_taker{$last_nonblank_token}
21165 || ( ( $last_nonblank_token eq '(' )
21166 && $is_indirect_object_taker{ $paren_type[$paren_depth] } )
21167 || ( $last_nonblank_type =~ /^[Uw]$/ ) # possible object
21176 $paren_semicolon_count[$paren_depth] = 0;
21178 $container_type = $want_paren;
21182 $container_type = $last_nonblank_token;
21184 # We can check for a syntax error here of unexpected '(',
21185 # but this is going to get messy...
21187 $expecting == OPERATOR
21189 # be sure this is not a method call of the form
21190 # &method(...), $method->(..), &{method}(...),
21191 # $ref[2](list) is ok & short for $ref[2]->(list)
21192 # NOTE: at present, braces in something like &{ xxx }
21193 # are not marked as a block, we might have a method call
21194 && $last_nonblank_token !~ /^([\]\}\&]|\-\>)/
21199 # ref: camel 3 p 703.
21200 if ( $last_last_nonblank_token eq 'do' ) {
21202 "do SUBROUTINE is deprecated; consider & or -> notation\n"
21207 # if this is an empty list, (), then it is not an
21208 # error; for example, we might have a constant pi and
21209 # invoke it with pi() or just pi;
21210 my ( $next_nonblank_token, $i_next ) =
21211 find_next_nonblank_token( $i, $rtokens,
21212 $max_token_index );
21213 if ( $next_nonblank_token ne ')' ) {
21215 error_if_expecting_OPERATOR('(');
21217 if ( $last_nonblank_type eq 'C' ) {
21219 "$last_nonblank_token has a void prototype\n";
21221 elsif ( $last_nonblank_type eq 'i' ) {
21223 && $last_nonblank_token =~ /^\$/ )
21226 "Do you mean '$last_nonblank_token->(' ?\n";
21230 interrupt_logfile();
21234 } ## end if ( $next_nonblank_token...
21235 } ## end else [ if ( $last_last_nonblank_token...
21236 } ## end if ( $expecting == OPERATOR...
21238 $paren_type[$paren_depth] = $container_type;
21240 increase_nesting_depth( PAREN, $$rtoken_map[$i_tok] );
21242 # propagate types down through nested parens
21243 # for example: the second paren in 'if ((' would be structural
21244 # since the first is.
21246 if ( $last_nonblank_token eq '(' ) {
21247 $type = $last_nonblank_type;
21250 # We exclude parens as structural after a ',' because it
21251 # causes subtle problems with continuation indentation for
21252 # something like this, where the first 'or' will not get
21257 # ( not defined $check )
21259 # or $check eq "new"
21260 # or $check eq "old",
21263 # Likewise, we exclude parens where a statement can start
21264 # because of problems with continuation indentation, like
21267 # ($firstline =~ /^#\!.*perl/)
21268 # and (print $File::Find::name, "\n")
21271 # (ref($usage_fref) =~ /CODE/)
21273 # : (&blast_usage, &blast_params, &blast_general_params);
21279 if ( $last_nonblank_type eq ')' ) {
21281 "Syntax error? found token '$last_nonblank_type' then '('\n"
21284 $paren_structural_type[$paren_depth] = $type;
21289 decrease_nesting_depth( PAREN, $$rtoken_map[$i_tok] );
21291 if ( $paren_structural_type[$paren_depth] eq '{' ) {
21295 $container_type = $paren_type[$paren_depth];
21297 # /^(for|foreach)$/
21298 if ( $is_for_foreach{ $paren_type[$paren_depth] } ) {
21299 my $num_sc = $paren_semicolon_count[$paren_depth];
21300 if ( $num_sc > 0 && $num_sc != 2 ) {
21301 warning("Expected 2 ';' in 'for(;;)' but saw $num_sc\n");
21305 if ( $paren_depth > 0 ) { $paren_depth-- }
21308 if ( $last_nonblank_type eq ',' ) {
21309 complain("Repeated ','s \n");
21312 # patch for operator_expected: note if we are in the list (use.t)
21313 if ( $statement_type eq 'use' ) { $statement_type = '_use' }
21314 ## FIXME: need to move this elsewhere, perhaps check after a '('
21315 ## elsif ($last_nonblank_token eq '(') {
21316 ## warning("Leading ','s illegal in some versions of perl\n");
21320 $context = UNKNOWN_CONTEXT;
21321 $statement_type = '';
21323 # /^(for|foreach)$/
21324 if ( $is_for_foreach{ $paren_type[$paren_depth] } )
21325 { # mark ; in for loop
21327 # Be careful: we do not want a semicolon such as the
21328 # following to be included:
21330 # for (sort {strcoll($a,$b);} keys %investments) {
21332 if ( $brace_depth == $depth_array[PAREN][BRACE][$paren_depth]
21333 && $square_bracket_depth ==
21334 $depth_array[PAREN][SQUARE_BRACKET][$paren_depth] )
21338 $paren_semicolon_count[$paren_depth]++;
21344 error_if_expecting_OPERATOR("String")
21345 if ( $expecting == OPERATOR );
21348 $allowed_quote_modifiers = "";
21351 error_if_expecting_OPERATOR("String")
21352 if ( $expecting == OPERATOR );
21355 $allowed_quote_modifiers = "";
21358 error_if_expecting_OPERATOR("String")
21359 if ( $expecting == OPERATOR );
21362 $allowed_quote_modifiers = "";
21367 if ( $expecting == UNKNOWN ) { # indeterminte, must guess..
21369 ( $is_pattern, $msg ) =
21370 guess_if_pattern_or_division( $i, $rtokens, $rtoken_map,
21371 $max_token_index );
21374 write_diagnostics("DIVIDE:$msg\n");
21375 write_logfile_entry($msg);
21378 else { $is_pattern = ( $expecting == TERM ) }
21383 $allowed_quote_modifiers = '[cgimosx]';
21385 else { # not a pattern; check for a /= token
21387 if ( $$rtokens[ $i + 1 ] eq '=' ) { # form token /=
21393 #DEBUG - collecting info on what tokens follow a divide
21394 # for development of guessing algorithm
21395 #if ( numerator_expected( $i, $rtokens, $max_token_index ) < 0 ) {
21396 # #write_diagnostics( "DIVIDE? $input_line\n" );
21402 # if we just saw a ')', we will label this block with
21403 # its type. We need to do this to allow sub
21404 # code_block_type to determine if this brace starts a
21405 # code block or anonymous hash. (The type of a paren
21406 # pair is the preceding token, such as 'if', 'else',
21408 $container_type = "";
21410 # ATTRS: for a '{' following an attribute list, reset
21411 # things to look like we just saw the sub name
21412 if ( $statement_type =~ /^sub/ ) {
21413 $last_nonblank_token = $statement_type;
21414 $last_nonblank_type = 'i';
21415 $statement_type = "";
21418 # patch for SWITCH/CASE: hide these keywords from an immediately
21419 # following opening brace
21420 elsif ( ( $statement_type eq 'case' || $statement_type eq 'when' )
21421 && $statement_type eq $last_nonblank_token )
21423 $last_nonblank_token = ";";
21426 elsif ( $last_nonblank_token eq ')' ) {
21427 $last_nonblank_token = $paren_type[ $paren_depth + 1 ];
21429 # defensive move in case of a nesting error (pbug.t)
21430 # in which this ')' had no previous '('
21431 # this nesting error will have been caught
21432 if ( !defined($last_nonblank_token) ) {
21433 $last_nonblank_token = 'if';
21436 # check for syntax error here;
21437 unless ( $is_blocktype_with_paren{$last_nonblank_token} ) {
21438 my $list = join( ' ', sort keys %is_blocktype_with_paren );
21440 "syntax error at ') {', didn't see one of: $list\n");
21444 # patch for paren-less for/foreach glitch, part 2.
21445 # see note below under 'qw'
21446 elsif ($last_nonblank_token eq 'qw'
21447 && $is_for_foreach{$want_paren} )
21449 $last_nonblank_token = $want_paren;
21450 if ( $last_last_nonblank_token eq $want_paren ) {
21452 "syntax error at '$want_paren .. {' -- missing \$ loop variable\n"
21459 # now identify which of the three possible types of
21460 # curly braces we have: hash index container, anonymous
21461 # hash reference, or code block.
21463 # non-structural (hash index) curly brace pair
21464 # get marked 'L' and 'R'
21465 if ( is_non_structural_brace() ) {
21468 # patch for SWITCH/CASE:
21469 # allow paren-less identifier after 'when'
21470 # if the brace is preceded by a space
21471 if ( $statement_type eq 'when'
21472 && $last_nonblank_type eq 'i'
21473 && $last_last_nonblank_type eq 'k'
21474 && ( $i_tok == 0 || $rtoken_type->[ $i_tok - 1 ] eq 'b' ) )
21477 $block_type = $statement_type;
21481 # code and anonymous hash have the same type, '{', but are
21482 # distinguished by 'block_type',
21483 # which will be blank for an anonymous hash
21486 $block_type = code_block_type( $i_tok, $rtokens, $rtoken_type,
21487 $max_token_index );
21489 # patch to promote bareword type to function taking block
21491 && $last_nonblank_type eq 'w'
21492 && $last_nonblank_i >= 0 )
21494 if ( $routput_token_type->[$last_nonblank_i] eq 'w' ) {
21495 $routput_token_type->[$last_nonblank_i] = 'G';
21499 # patch for SWITCH/CASE: if we find a stray opening block brace
21500 # where we might accept a 'case' or 'when' block, then take it
21501 if ( $statement_type eq 'case'
21502 || $statement_type eq 'when' )
21504 if ( !$block_type || $block_type eq '}' ) {
21505 $block_type = $statement_type;
21509 $brace_type[ ++$brace_depth ] = $block_type;
21510 $brace_package[$brace_depth] = $current_package;
21512 increase_nesting_depth( BRACE, $$rtoken_map[$i_tok] );
21513 $brace_structural_type[$brace_depth] = $type;
21514 $brace_context[$brace_depth] = $context;
21515 $brace_statement_type[$brace_depth] = $statement_type;
21518 $block_type = $brace_type[$brace_depth];
21519 if ($block_type) { $statement_type = '' }
21520 if ( defined( $brace_package[$brace_depth] ) ) {
21521 $current_package = $brace_package[$brace_depth];
21524 # can happen on brace error (caught elsewhere)
21528 decrease_nesting_depth( BRACE, $$rtoken_map[$i_tok] );
21530 if ( $brace_structural_type[$brace_depth] eq 'L' ) {
21534 # propagate type information for 'do' and 'eval' blocks.
21535 # This is necessary to enable us to know if an operator
21536 # or term is expected next
21537 if ( $is_block_operator{ $brace_type[$brace_depth] } ) {
21538 $tok = $brace_type[$brace_depth];
21541 $context = $brace_context[$brace_depth];
21542 $statement_type = $brace_statement_type[$brace_depth];
21543 if ( $brace_depth > 0 ) { $brace_depth--; }
21545 '&' => sub { # maybe sub call? start looking
21547 # We have to check for sub call unless we are sure we
21548 # are expecting an operator. This example from s2p
21549 # got mistaken as a q operator in an early version:
21550 # print BODY &q(<<'EOT');
21551 if ( $expecting != OPERATOR ) {
21557 '<' => sub { # angle operator or less than?
21559 if ( $expecting != OPERATOR ) {
21561 find_angle_operator_termination( $input_line, $i, $rtoken_map,
21562 $expecting, $max_token_index );
21568 '?' => sub { # ?: conditional or starting pattern?
21572 if ( $expecting == UNKNOWN ) {
21575 ( $is_pattern, $msg ) =
21576 guess_if_pattern_or_conditional( $i, $rtokens, $rtoken_map,
21577 $max_token_index );
21579 if ($msg) { write_logfile_entry($msg) }
21581 else { $is_pattern = ( $expecting == TERM ) }
21586 $allowed_quote_modifiers = '[cgimosx]'; # TBD:check this
21590 increase_nesting_depth( QUESTION_COLON,
21591 $$rtoken_map[$i_tok] );
21594 '*' => sub { # typeglob, or multiply?
21596 if ( $expecting == TERM ) {
21601 if ( $$rtokens[ $i + 1 ] eq '=' ) {
21606 elsif ( $$rtokens[ $i + 1 ] eq '*' ) {
21610 if ( $$rtokens[ $i + 1 ] eq '=' ) {
21618 '.' => sub { # what kind of . ?
21620 if ( $expecting != OPERATOR ) {
21622 if ( $type eq '.' ) {
21623 error_if_expecting_TERM()
21624 if ( $expecting == TERM );
21632 # if this is the first nonblank character, call it a label
21633 # since perl seems to just swallow it
21634 if ( $input_line_number == 1 && $last_nonblank_i == -1 ) {
21638 # ATTRS: check for a ':' which introduces an attribute list
21639 # (this might eventually get its own token type)
21640 elsif ( $statement_type =~ /^sub/ ) {
21642 $in_attribute_list = 1;
21645 # check for scalar attribute, such as
21646 # my $foo : shared = 1;
21647 elsif ($is_my_our{$statement_type}
21648 && $current_depth[QUESTION_COLON] == 0 )
21651 $in_attribute_list = 1;
21654 # otherwise, it should be part of a ?/: operator
21657 decrease_nesting_depth( QUESTION_COLON,
21658 $$rtoken_map[$i_tok] );
21659 if ( $last_nonblank_token eq '?' ) {
21660 warning("Syntax error near ? :\n");
21664 '+' => sub { # what kind of plus?
21666 if ( $expecting == TERM ) {
21667 my $number = scan_number();
21669 # unary plus is safest assumption if not a number
21670 if ( !defined($number) ) { $type = 'p'; }
21672 elsif ( $expecting == OPERATOR ) {
21675 if ( $next_type eq 'w' ) { $type = 'p' }
21680 error_if_expecting_OPERATOR("Array")
21681 if ( $expecting == OPERATOR );
21684 '%' => sub { # hash or modulo?
21686 # first guess is hash if no following blank
21687 if ( $expecting == UNKNOWN ) {
21688 if ( $next_type ne 'b' ) { $expecting = TERM }
21690 if ( $expecting == TERM ) {
21695 $square_bracket_type[ ++$square_bracket_depth ] =
21696 $last_nonblank_token;
21698 increase_nesting_depth( SQUARE_BRACKET, $$rtoken_map[$i_tok] );
21700 # It may seem odd, but structural square brackets have
21701 # type '{' and '}'. This simplifies the indentation logic.
21702 if ( !is_non_structural_brace() ) {
21705 $square_bracket_structural_type[$square_bracket_depth] = $type;
21709 decrease_nesting_depth( SQUARE_BRACKET, $$rtoken_map[$i_tok] );
21711 if ( $square_bracket_structural_type[$square_bracket_depth] eq '{' )
21715 if ( $square_bracket_depth > 0 ) { $square_bracket_depth--; }
21717 '-' => sub { # what kind of minus?
21719 if ( ( $expecting != OPERATOR )
21720 && $is_file_test_operator{$next_tok} )
21726 elsif ( $expecting == TERM ) {
21727 my $number = scan_number();
21729 # maybe part of bareword token? unary is safest
21730 if ( !defined($number) ) { $type = 'm'; }
21733 elsif ( $expecting == OPERATOR ) {
21737 if ( $next_type eq 'w' ) {
21745 # check for special variables like ${^WARNING_BITS}
21746 if ( $expecting == TERM ) {
21748 # FIXME: this should work but will not catch errors
21749 # because we also have to be sure that previous token is
21750 # a type character ($,@,%).
21751 if ( $last_nonblank_token eq '{'
21752 && ( $next_tok =~ /^[A-Za-z_]/ ) )
21755 if ( $next_tok eq 'W' ) {
21756 $tokenizer_self->{_saw_perl_dash_w} = 1;
21758 $tok = $tok . $next_tok;
21764 unless ( error_if_expecting_TERM() ) {
21766 # Something like this is valid but strange:
21768 complain("The '^' seems unusual here\n");
21774 '::' => sub { # probably a sub call
21775 scan_bare_identifier();
21777 '<<' => sub { # maybe a here-doc?
21779 unless ( $i < $max_token_index )
21780 ; # here-doc not possible if end of line
21782 if ( $expecting != OPERATOR ) {
21783 my ( $found_target, $here_doc_target, $here_quote_character,
21786 $found_target, $here_doc_target, $here_quote_character, $i,
21789 = find_here_doc( $expecting, $i, $rtokens, $rtoken_map,
21790 $max_token_index );
21792 if ($found_target) {
21793 push @{$rhere_target_list},
21794 [ $here_doc_target, $here_quote_character ];
21796 if ( length($here_doc_target) > 80 ) {
21797 my $truncated = substr( $here_doc_target, 0, 80 );
21798 complain("Long here-target: '$truncated' ...\n");
21800 elsif ( $here_doc_target !~ /^[A-Z_]\w+$/ ) {
21802 "Unconventional here-target: '$here_doc_target'\n"
21806 elsif ( $expecting == TERM ) {
21807 unless ($saw_error) {
21809 # shouldn't happen..
21810 warning("Program bug; didn't find here doc target\n");
21811 report_definite_bug();
21820 # if -> points to a bare word, we must scan for an identifier,
21821 # otherwise something like ->y would look like the y operator
21825 # type = 'pp' for pre-increment, '++' for post-increment
21827 if ( $expecting == TERM ) { $type = 'pp' }
21828 elsif ( $expecting == UNKNOWN ) {
21829 my ( $next_nonblank_token, $i_next ) =
21830 find_next_nonblank_token( $i, $rtokens, $max_token_index );
21831 if ( $next_nonblank_token eq '$' ) { $type = 'pp' }
21836 if ( $last_nonblank_type eq $tok ) {
21837 complain("Repeated '=>'s \n");
21840 # patch for operator_expected: note if we are in the list (use.t)
21841 # TODO: make version numbers a new token type
21842 if ( $statement_type eq 'use' ) { $statement_type = '_use' }
21845 # type = 'mm' for pre-decrement, '--' for post-decrement
21848 if ( $expecting == TERM ) { $type = 'mm' }
21849 elsif ( $expecting == UNKNOWN ) {
21850 my ( $next_nonblank_token, $i_next ) =
21851 find_next_nonblank_token( $i, $rtokens, $max_token_index );
21852 if ( $next_nonblank_token eq '$' ) { $type = 'mm' }
21857 error_if_expecting_TERM()
21858 if ( $expecting == TERM );
21862 error_if_expecting_TERM()
21863 if ( $expecting == TERM );
21867 error_if_expecting_TERM()
21868 if ( $expecting == TERM );
21872 # ------------------------------------------------------------
21873 # end hash of code for handling individual token types
21874 # ------------------------------------------------------------
21876 my %matching_start_token = ( '}' => '{', ']' => '[', ')' => '(' );
21878 # These block types terminate statements and do not need a trailing
21880 # patched for SWITCH/CASE:
21881 my %is_zero_continuation_block_type;
21882 @_ = qw( } { BEGIN END CHECK INIT AUTOLOAD DESTROY continue ;
21883 if elsif else unless while until for foreach switch case given when);
21884 @is_zero_continuation_block_type{@_} = (1) x scalar(@_);
21886 my %is_not_zero_continuation_block_type;
21887 @_ = qw(sort grep map do eval);
21888 @is_not_zero_continuation_block_type{@_} = (1) x scalar(@_);
21890 my %is_logical_container;
21891 @_ = qw(if elsif unless while and or err not && ! || for foreach);
21892 @is_logical_container{@_} = (1) x scalar(@_);
21894 my %is_binary_type;
21896 @is_binary_type{@_} = (1) x scalar(@_);
21898 my %is_binary_keyword;
21899 @_ = qw(and or err eq ne cmp);
21900 @is_binary_keyword{@_} = (1) x scalar(@_);
21902 # 'L' is token for opening { at hash key
21903 my %is_opening_type;
21904 @_ = qw" L { ( [ ";
21905 @is_opening_type{@_} = (1) x scalar(@_);
21907 # 'R' is token for closing } at hash key
21908 my %is_closing_type;
21909 @_ = qw" R } ) ] ";
21910 @is_closing_type{@_} = (1) x scalar(@_);
21912 my %is_redo_last_next_goto;
21913 @_ = qw(redo last next goto);
21914 @is_redo_last_next_goto{@_} = (1) x scalar(@_);
21916 my %is_use_require;
21917 @_ = qw(use require);
21918 @is_use_require{@_} = (1) x scalar(@_);
21920 my %is_sub_package;
21921 @_ = qw(sub package);
21922 @is_sub_package{@_} = (1) x scalar(@_);
21924 # This hash holds the hash key in $tokenizer_self for these keywords:
21925 my %is_format_END_DATA = (
21926 'format' => '_in_format',
21927 '__END__' => '_in_end',
21928 '__DATA__' => '_in_data',
21931 # ref: camel 3 p 147,
21932 # but perl may accept undocumented flags
21933 my %quote_modifiers = (
21934 's' => '[cegimosx]',
21937 'm' => '[cgimosx]',
21945 # table showing how many quoted things to look for after quote operator..
21946 # s, y, tr have 2 (pattern and replacement)
21947 # others have 1 (pattern only)
21948 my %quote_items = (
21960 sub tokenize_this_line {
21962 # This routine breaks a line of perl code into tokens which are of use in
21963 # indentation and reformatting. One of my goals has been to define tokens
21964 # such that a newline may be inserted between any pair of tokens without
21965 # changing or invalidating the program. This version comes close to this,
21966 # although there are necessarily a few exceptions which must be caught by
21967 # the formatter. Many of these involve the treatment of bare words.
21969 # The tokens and their types are returned in arrays. See previous
21970 # routine for their names.
21972 # See also the array "valid_token_types" in the BEGIN section for an
21975 # To simplify things, token types are either a single character, or they
21976 # are identical to the tokens themselves.
21978 # As a debugging aid, the -D flag creates a file containing a side-by-side
21979 # comparison of the input string and its tokenization for each line of a file.
21980 # This is an invaluable debugging aid.
21982 # In addition to tokens, and some associated quantities, the tokenizer
21983 # also returns flags indication any special line types. These include
21984 # quotes, here_docs, formats.
21986 # -----------------------------------------------------------------------
21988 # How to add NEW_TOKENS:
21990 # New token types will undoubtedly be needed in the future both to keep up
21991 # with changes in perl and to help adapt the tokenizer to other applications.
21993 # Here are some notes on the minimal steps. I wrote these notes while
21994 # adding the 'v' token type for v-strings, which are things like version
21995 # numbers 5.6.0, and ip addresses, and will use that as an example. ( You
21996 # can use your editor to search for the string "NEW_TOKENS" to find the
21997 # appropriate sections to change):
21999 # *. Try to talk somebody else into doing it! If not, ..
22001 # *. Make a backup of your current version in case things don't work out!
22003 # *. Think of a new, unused character for the token type, and add to
22004 # the array @valid_token_types in the BEGIN section of this package.
22005 # For example, I used 'v' for v-strings.
22007 # *. Implement coding to recognize the $type of the token in this routine.
22008 # This is the hardest part, and is best done by immitating or modifying
22009 # some of the existing coding. For example, to recognize v-strings, I
22010 # patched 'sub scan_bare_identifier' to recognize v-strings beginning with
22011 # 'v' and 'sub scan_number' to recognize v-strings without the leading 'v'.
22013 # *. Update sub operator_expected. This update is critically important but
22014 # the coding is trivial. Look at the comments in that routine for help.
22015 # For v-strings, which should behave like numbers, I just added 'v' to the
22016 # regex used to handle numbers and strings (types 'n' and 'Q').
22018 # *. Implement a 'bond strength' rule in sub set_bond_strengths in
22019 # Perl::Tidy::Formatter for breaking lines around this token type. You can
22020 # skip this step and take the default at first, then adjust later to get
22021 # desired results. For adding type 'v', I looked at sub bond_strength and
22022 # saw that number type 'n' was using default strengths, so I didn't do
22023 # anything. I may tune it up someday if I don't like the way line
22024 # breaks with v-strings look.
22026 # *. Implement a 'whitespace' rule in sub set_white_space_flag in
22027 # Perl::Tidy::Formatter. For adding type 'v', I looked at this routine
22028 # and saw that type 'n' used spaces on both sides, so I just added 'v'
22029 # to the array @spaces_both_sides.
22031 # *. Update HtmlWriter package so that users can colorize the token as
22032 # desired. This is quite easy; see comments identified by 'NEW_TOKENS' in
22033 # that package. For v-strings, I initially chose to use a default color
22034 # equal to the default for numbers, but it might be nice to change that
22037 # *. Update comments in Perl::Tidy::Tokenizer::dump_token_types.
22039 # *. Run lots and lots of debug tests. Start with special files designed
22040 # to test the new token type. Run with the -D flag to create a .DEBUG
22041 # file which shows the tokenization. When these work ok, test as many old
22042 # scripts as possible. Start with all of the '.t' files in the 'test'
22043 # directory of the distribution file. Compare .tdy output with previous
22044 # version and updated version to see the differences. Then include as
22045 # many more files as possible. My own technique has been to collect a huge
22046 # number of perl scripts (thousands!) into one directory and run perltidy
22047 # *, then run diff between the output of the previous version and the
22050 # *. For another example, search for the smartmatch operator '~~'
22051 # with your editor to see where updates were made for it.
22053 # -----------------------------------------------------------------------
22055 my $line_of_tokens = shift;
22056 my ($untrimmed_input_line) = $line_of_tokens->{_line_text};
22058 # patch while coding change is underway
22059 # make callers private data to allow access
22060 # $tokenizer_self = $caller_tokenizer_self;
22062 # extract line number for use in error messages
22063 $input_line_number = $line_of_tokens->{_line_number};
22065 # reinitialize for multi-line quote
22066 $line_of_tokens->{_starting_in_quote} = $in_quote && $quote_type eq 'Q';
22068 # check for pod documentation
22069 if ( ( $untrimmed_input_line =~ /^=[A-Za-z_]/ ) ) {
22071 # must not be in multi-line quote
22072 # and must not be in an eqn
22073 if ( !$in_quote and ( operator_expected( 'b', '=', 'b' ) == TERM ) )
22075 $tokenizer_self->{_in_pod} = 1;
22080 $input_line = $untrimmed_input_line;
22084 # trim start of this line unless we are continuing a quoted line
22085 # do not trim end because we might end in a quote (test: deken4.pl)
22086 # Perl::Tidy::Formatter will delete needless trailing blanks
22087 unless ( $in_quote && ( $quote_type eq 'Q' ) ) {
22088 $input_line =~ s/^\s*//; # trim left end
22091 # update the copy of the line for use in error messages
22092 # This must be exactly what we give the pre_tokenizer
22093 $tokenizer_self->{_line_text} = $input_line;
22095 # re-initialize for the main loop
22096 $routput_token_list = []; # stack of output token indexes
22097 $routput_token_type = []; # token types
22098 $routput_block_type = []; # types of code block
22099 $routput_container_type = []; # paren types, such as if, elsif, ..
22100 $routput_type_sequence = []; # nesting sequential number
22102 $rhere_target_list = [];
22104 $tok = $last_nonblank_token;
22105 $type = $last_nonblank_type;
22106 $prototype = $last_nonblank_prototype;
22107 $last_nonblank_i = -1;
22108 $block_type = $last_nonblank_block_type;
22109 $container_type = $last_nonblank_container_type;
22110 $type_sequence = $last_nonblank_type_sequence;
22113 # tokenization is done in two stages..
22114 # stage 1 is a very simple pre-tokenization
22115 my $max_tokens_wanted = 0; # this signals pre_tokenize to get all tokens
22117 # a little optimization for a full-line comment
22118 if ( !$in_quote && ( $input_line =~ /^#/ ) ) {
22119 $max_tokens_wanted = 1 # no use tokenizing a comment
22122 # start by breaking the line into pre-tokens
22123 ( $rtokens, $rtoken_map, $rtoken_type ) =
22124 pre_tokenize( $input_line, $max_tokens_wanted );
22126 $max_token_index = scalar(@$rtokens) - 1;
22127 push( @$rtokens, ' ', ' ', ' ' ); # extra whitespace simplifies logic
22128 push( @$rtoken_map, 0, 0, 0 ); # shouldn't be referenced
22129 push( @$rtoken_type, 'b', 'b', 'b' );
22131 # initialize for main loop
22132 for $i ( 0 .. $max_token_index + 3 ) {
22133 $routput_token_type->[$i] = "";
22134 $routput_block_type->[$i] = "";
22135 $routput_container_type->[$i] = "";
22136 $routput_type_sequence->[$i] = "";
22141 # ------------------------------------------------------------
22142 # begin main tokenization loop
22143 # ------------------------------------------------------------
22145 # we are looking at each pre-token of one line and combining them
22147 while ( ++$i <= $max_token_index ) {
22149 if ($in_quote) { # continue looking for end of a quote
22150 $type = $quote_type;
22152 unless ( @{$routput_token_list} )
22153 { # initialize if continuation line
22154 push( @{$routput_token_list}, $i );
22155 $routput_token_type->[$i] = $type;
22158 $tok = $quote_character unless ( $quote_character =~ /^\s*$/ );
22160 # scan for the end of the quote or pattern
22162 $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
22163 $quoted_string_1, $quoted_string_2
22166 $i, $in_quote, $quote_character,
22167 $quote_pos, $quote_depth, $quoted_string_1,
22168 $quoted_string_2, $rtokens, $rtoken_map,
22172 # all done if we didn't find it
22173 last if ($in_quote);
22175 # save pattern and replacement text for rescanning
22176 my $qs1 = $quoted_string_1;
22177 my $qs2 = $quoted_string_2;
22179 # re-initialize for next search
22180 $quote_character = '';
22183 $quoted_string_1 = "";
22184 $quoted_string_2 = "";
22185 last if ( ++$i > $max_token_index );
22187 # look for any modifiers
22188 if ($allowed_quote_modifiers) {
22190 # check for exact quote modifiers
22191 if ( $$rtokens[$i] =~ /^[A-Za-z_]/ ) {
22192 my $str = $$rtokens[$i];
22193 my $saw_modifier_e;
22194 while ( $str =~ /\G$allowed_quote_modifiers/gc ) {
22195 my $pos = pos($str);
22196 my $char = substr( $str, $pos - 1, 1 );
22197 $saw_modifier_e ||= ( $char eq 'e' );
22200 # For an 'e' quote modifier we must scan the replacement
22201 # text for here-doc targets.
22202 if ($saw_modifier_e) {
22204 my $rht = scan_replacement_text($qs1);
22206 # Change type from 'Q' to 'h' for quotes with
22207 # here-doc targets so that the formatter (see sub
22208 # print_line_of_tokens) will not make any line
22209 # breaks after this point.
22211 push @{$rhere_target_list}, @{$rht};
22213 if ( $i_tok < 0 ) {
22214 my $ilast = $routput_token_list->[-1];
22215 $routput_token_type->[$ilast] = $type;
22220 if ( defined( pos($str) ) ) {
22223 if ( pos($str) == length($str) ) {
22224 last if ( ++$i > $max_token_index );
22227 # Looks like a joined quote modifier
22228 # and keyword, maybe something like
22229 # s/xxx/yyy/gefor @k=...
22230 # Example is "galgen.pl". Would have to split
22231 # the word and insert a new token in the
22232 # pre-token list. This is so rare that I haven't
22233 # done it. Will just issue a warning citation.
22235 # This error might also be triggered if my quote
22236 # modifier characters are incomplete
22240 Partial match to quote modifier $allowed_quote_modifiers at word: '$str'
22241 Please put a space between quote modifiers and trailing keywords.
22244 # print "token $$rtokens[$i]\n";
22245 # my $num = length($str) - pos($str);
22246 # $$rtokens[$i]=substr($$rtokens[$i],pos($str),$num);
22247 # print "continuing with new token $$rtokens[$i]\n";
22249 # skipping past this token does least damage
22250 last if ( ++$i > $max_token_index );
22255 # example file: rokicki4.pl
22256 # This error might also be triggered if my quote
22257 # modifier characters are incomplete
22258 write_logfile_entry(
22259 "Note: found word $str at quote modifier location\n"
22265 $allowed_quote_modifiers = "";
22269 unless ( $tok =~ /^\s*$/ ) {
22271 # try to catch some common errors
22272 if ( ( $type eq 'n' ) && ( $tok ne '0' ) ) {
22274 if ( $last_nonblank_token eq 'eq' ) {
22275 complain("Should 'eq' be '==' here ?\n");
22277 elsif ( $last_nonblank_token eq 'ne' ) {
22278 complain("Should 'ne' be '!=' here ?\n");
22282 $last_last_nonblank_token = $last_nonblank_token;
22283 $last_last_nonblank_type = $last_nonblank_type;
22284 $last_last_nonblank_block_type = $last_nonblank_block_type;
22285 $last_last_nonblank_container_type =
22286 $last_nonblank_container_type;
22287 $last_last_nonblank_type_sequence =
22288 $last_nonblank_type_sequence;
22289 $last_nonblank_token = $tok;
22290 $last_nonblank_type = $type;
22291 $last_nonblank_prototype = $prototype;
22292 $last_nonblank_block_type = $block_type;
22293 $last_nonblank_container_type = $container_type;
22294 $last_nonblank_type_sequence = $type_sequence;
22295 $last_nonblank_i = $i_tok;
22298 # store previous token type
22299 if ( $i_tok >= 0 ) {
22300 $routput_token_type->[$i_tok] = $type;
22301 $routput_block_type->[$i_tok] = $block_type;
22302 $routput_container_type->[$i_tok] = $container_type;
22303 $routput_type_sequence->[$i_tok] = $type_sequence;
22305 my $pre_tok = $$rtokens[$i]; # get the next pre-token
22306 my $pre_type = $$rtoken_type[$i]; # and type
22308 $type = $pre_type; # to be modified as necessary
22309 $block_type = ""; # blank for all tokens except code block braces
22310 $container_type = ""; # blank for all tokens except some parens
22311 $type_sequence = ""; # blank for all tokens except ?/:
22312 $prototype = ""; # blank for all tokens except user defined subs
22315 # this pre-token will start an output token
22316 push( @{$routput_token_list}, $i_tok );
22318 # continue gathering identifier if necessary
22319 # but do not start on blanks and comments
22320 if ( $id_scan_state && $pre_type !~ /[b#]/ ) {
22322 if ( $id_scan_state =~ /^(sub|package)/ ) {
22329 last if ($id_scan_state);
22330 next if ( ( $i > 0 ) || $type );
22332 # didn't find any token; start over
22337 # handle whitespace tokens..
22338 next if ( $type eq 'b' );
22339 my $prev_tok = $i > 0 ? $$rtokens[ $i - 1 ] : ' ';
22340 my $prev_type = $i > 0 ? $$rtoken_type[ $i - 1 ] : 'b';
22342 # Build larger tokens where possible, since we are not in a quote.
22344 # First try to assemble digraphs. The following tokens are
22345 # excluded and handled specially:
22346 # '/=' is excluded because the / might start a pattern.
22347 # 'x=' is excluded since it might be $x=, with $ on previous line
22348 # '**' and *= might be typeglobs of punctuation variables
22349 # I have allowed tokens starting with <, such as <=,
22350 # because I don't think these could be valid angle operators.
22351 # test file: storrs4.pl
22352 my $test_tok = $tok . $$rtokens[ $i + 1 ];
22353 my $combine_ok = $is_digraph{$test_tok};
22355 # check for special cases which cannot be combined
22358 # '//' must be defined_or operator if an operator is expected.
22359 # TODO: Code for other ambiguous digraphs (/=, x=, **, *=)
22360 # could be migrated here for clarity
22361 if ( $test_tok eq '//' ) {
22362 my $next_type = $$rtokens[ $i + 1 ];
22364 operator_expected( $prev_type, $tok, $next_type );
22365 $combine_ok = 0 unless ( $expecting == OPERATOR );
22371 && ( $test_tok ne '/=' ) # might be pattern
22372 && ( $test_tok ne 'x=' ) # might be $x
22373 && ( $test_tok ne '**' ) # typeglob?
22374 && ( $test_tok ne '*=' ) # typeglob?
22380 # Now try to assemble trigraphs. Note that all possible
22381 # perl trigraphs can be constructed by appending a character
22383 $test_tok = $tok . $$rtokens[ $i + 1 ];
22385 if ( $is_trigraph{$test_tok} ) {
22392 $next_tok = $$rtokens[ $i + 1 ];
22393 $next_type = $$rtoken_type[ $i + 1 ];
22395 TOKENIZER_DEBUG_FLAG_TOKENIZE && do {
22398 $last_nonblank_token, $tok,
22399 $next_tok, $brace_depth,
22400 $brace_type[$brace_depth], $paren_depth,
22401 $paren_type[$paren_depth]
22403 print "TOKENIZE:(@debug_list)\n";
22406 # turn off attribute list on first non-blank, non-bareword
22407 if ( $pre_type ne 'w' ) { $in_attribute_list = 0 }
22409 ###############################################################
22410 # We have the next token, $tok.
22411 # Now we have to examine this token and decide what it is
22412 # and define its $type
22414 # section 1: bare words
22415 ###############################################################
22417 if ( $pre_type eq 'w' ) {
22418 $expecting = operator_expected( $prev_type, $tok, $next_type );
22419 my ( $next_nonblank_token, $i_next ) =
22420 find_next_nonblank_token( $i, $rtokens, $max_token_index );
22422 # ATTRS: handle sub and variable attributes
22423 if ($in_attribute_list) {
22425 # treat bare word followed by open paren like qw(
22426 if ( $next_nonblank_token eq '(' ) {
22427 $in_quote = $quote_items{'q'};
22428 $allowed_quote_modifiers = $quote_modifiers{'q'};
22434 # handle bareword not followed by open paren
22441 # quote a word followed by => operator
22442 if ( $next_nonblank_token eq '=' ) {
22444 if ( $$rtokens[ $i_next + 1 ] eq '>' ) {
22445 if ( $is_constant{$current_package}{$tok} ) {
22448 elsif ( $is_user_function{$current_package}{$tok} ) {
22451 $user_function_prototype{$current_package}{$tok};
22453 elsif ( $tok =~ /^v\d+$/ ) {
22455 report_v_string($tok);
22457 else { $type = 'w' }
22463 # quote a bare word within braces..like xxx->{s}; note that we
22464 # must be sure this is not a structural brace, to avoid
22465 # mistaking {s} in the following for a quoted bare word:
22466 # for(@[){s}bla}BLA}
22467 if ( ( $last_nonblank_type eq 'L' )
22468 && ( $next_nonblank_token eq '}' ) )
22474 # a bare word immediately followed by :: is not a keyword;
22475 # use $tok_kw when testing for keywords to avoid a mistake
22477 if ( $$rtokens[ $i + 1 ] eq ':' && $$rtokens[ $i + 2 ] eq ':' )
22482 # handle operator x (now we know it isn't $x=)
22483 if ( ( $tok =~ /^x\d*$/ ) && ( $expecting == OPERATOR ) ) {
22484 if ( $tok eq 'x' ) {
22486 if ( $$rtokens[ $i + 1 ] eq '=' ) { # x=
22496 # FIXME: Patch: mark something like x4 as an integer for now
22497 # It gets fixed downstream. This is easier than
22498 # splitting the pretoken.
22504 elsif ( ( $tok eq 'strict' )
22505 and ( $last_nonblank_token eq 'use' ) )
22507 $tokenizer_self->{_saw_use_strict} = 1;
22508 scan_bare_identifier();
22511 elsif ( ( $tok eq 'warnings' )
22512 and ( $last_nonblank_token eq 'use' ) )
22514 $tokenizer_self->{_saw_perl_dash_w} = 1;
22516 # scan as identifier, so that we pick up something like:
22517 # use warnings::register
22518 scan_bare_identifier();
22522 $tok eq 'AutoLoader'
22523 && $tokenizer_self->{_look_for_autoloader}
22525 $last_nonblank_token eq 'use'
22527 # these regexes are from AutoSplit.pm, which we want
22529 || $input_line =~ /^\s*(use|require)\s+AutoLoader\b/
22530 || $input_line =~ /\bISA\s*=.*\bAutoLoader\b/
22534 write_logfile_entry("AutoLoader seen, -nlal deactivates\n");
22535 $tokenizer_self->{_saw_autoloader} = 1;
22536 $tokenizer_self->{_look_for_autoloader} = 0;
22537 scan_bare_identifier();
22541 $tok eq 'SelfLoader'
22542 && $tokenizer_self->{_look_for_selfloader}
22543 && ( $last_nonblank_token eq 'use'
22544 || $input_line =~ /^\s*(use|require)\s+SelfLoader\b/
22545 || $input_line =~ /\bISA\s*=.*\bSelfLoader\b/ )
22548 write_logfile_entry("SelfLoader seen, -nlsl deactivates\n");
22549 $tokenizer_self->{_saw_selfloader} = 1;
22550 $tokenizer_self->{_look_for_selfloader} = 0;
22551 scan_bare_identifier();
22554 elsif ( ( $tok eq 'constant' )
22555 and ( $last_nonblank_token eq 'use' ) )
22557 scan_bare_identifier();
22558 my ( $next_nonblank_token, $i_next ) =
22559 find_next_nonblank_token( $i, $rtokens,
22560 $max_token_index );
22562 if ($next_nonblank_token) {
22564 if ( $is_keyword{$next_nonblank_token} ) {
22566 "Attempting to define constant '$next_nonblank_token' which is a perl keyword\n"
22570 # FIXME: could check for error in which next token is
22571 # not a word (number, punctuation, ..)
22573 $is_constant{$current_package}
22574 {$next_nonblank_token} = 1;
22579 # various quote operators
22580 elsif ( $is_q_qq_qw_qx_qr_s_y_tr_m{$tok} ) {
22581 if ( $expecting == OPERATOR ) {
22583 # patch for paren-less for/foreach glitch, part 1
22584 # perl will accept this construct as valid:
22586 # foreach my $key qw\Uno Due Tres Quadro\ {
22587 # print "Set $key\n";
22589 unless ( $tok eq 'qw' && $is_for_foreach{$want_paren} )
22591 error_if_expecting_OPERATOR();
22594 $in_quote = $quote_items{$tok};
22595 $allowed_quote_modifiers = $quote_modifiers{$tok};
22597 # All quote types are 'Q' except possibly qw quotes.
22598 # qw quotes are special in that they may generally be trimmed
22599 # of leading and trailing whitespace. So they are given a
22600 # separate type, 'q', unless requested otherwise.
22602 ( $tok eq 'qw' && $tokenizer_self->{_trim_qw} )
22605 $quote_type = $type;
22608 # check for a statement label
22610 ( $next_nonblank_token eq ':' )
22611 && ( $$rtokens[ $i_next + 1 ] ne ':' )
22612 && ( $i_next <= $max_token_index ) # colon on same line
22616 if ( $tok !~ /A-Z/ ) {
22617 push @{ $tokenizer_self->{_rlower_case_labels_at} },
22618 $input_line_number;
22626 # 'sub' || 'package'
22627 elsif ( $is_sub_package{$tok_kw} ) {
22628 error_if_expecting_OPERATOR()
22629 if ( $expecting == OPERATOR );
22633 # Note on token types for format, __DATA__, __END__:
22634 # It simplifies things to give these type ';', so that when we
22635 # start rescanning we will be expecting a token of type TERM.
22636 # We will switch to type 'k' before outputting the tokens.
22637 elsif ( $is_format_END_DATA{$tok_kw} ) {
22638 $type = ';'; # make tokenizer look for TERM next
22639 $tokenizer_self->{ $is_format_END_DATA{$tok_kw} } = 1;
22643 elsif ( $is_keyword{$tok_kw} ) {
22646 # Since for and foreach may not be followed immediately
22647 # by an opening paren, we have to remember which keyword
22648 # is associated with the next '('
22649 if ( $is_for_foreach{$tok} ) {
22650 if ( new_statement_ok() ) {
22651 $want_paren = $tok;
22655 # recognize 'use' statements, which are special
22656 elsif ( $is_use_require{$tok} ) {
22657 $statement_type = $tok;
22658 error_if_expecting_OPERATOR()
22659 if ( $expecting == OPERATOR );
22662 # remember my and our to check for trailing ": shared"
22663 elsif ( $is_my_our{$tok} ) {
22664 $statement_type = $tok;
22667 # Check for misplaced 'elsif' and 'else', but allow isolated
22668 # else or elsif blocks to be formatted. This is indicated
22669 # by a last noblank token of ';'
22670 elsif ( $tok eq 'elsif' ) {
22671 if ( $last_nonblank_token ne ';'
22672 && $last_nonblank_block_type !~
22673 /^(if|elsif|unless)$/ )
22676 "expecting '$tok' to follow one of 'if|elsif|unless'\n"
22680 elsif ( $tok eq 'else' ) {
22682 # patched for SWITCH/CASE
22683 if ( $last_nonblank_token ne ';'
22684 && $last_nonblank_block_type !~
22685 /^(if|elsif|unless|case|when)$/ )
22688 "expecting '$tok' to follow one of 'if|elsif|unless|case|when'\n"
22692 elsif ( $tok eq 'continue' ) {
22693 if ( $last_nonblank_token ne ';'
22694 && $last_nonblank_block_type !~
22695 /(^(\{|\}|;|while|until|for|foreach)|:$)/ )
22698 # note: ';' '{' and '}' in list above
22699 # because continues can follow bare blocks;
22700 # ':' is labeled block
22701 warning("'$tok' should follow a block\n");
22705 # patch for SWITCH/CASE if 'case' and 'when are
22706 # treated as keywords.
22707 elsif ( $tok eq 'when' || $tok eq 'case' ) {
22708 $statement_type = $tok; # next '{' is block
22712 # check for inline label following
22713 # /^(redo|last|next|goto)$/
22714 elsif (( $last_nonblank_type eq 'k' )
22715 && ( $is_redo_last_next_goto{$last_nonblank_token} ) )
22721 # something else --
22724 scan_bare_identifier();
22725 if ( $type eq 'w' ) {
22727 if ( $expecting == OPERATOR ) {
22729 # don't complain about possible indirect object
22733 # sub new($) { ... }
22734 # $b = new A::; # calls A::new
22735 # $c = new A; # same thing but suspicious
22736 # This will call A::new but we have a 'new' in
22737 # main:: which looks like a constant.
22739 if ( $last_nonblank_type eq 'C' ) {
22740 if ( $tok !~ /::$/ ) {
22742 Expecting operator after '$last_nonblank_token' but found bare word '$tok'
22743 Maybe indirectet object notation?
22748 error_if_expecting_OPERATOR("bareword");
22752 # mark bare words immediately followed by a paren as
22754 $next_tok = $$rtokens[ $i + 1 ];
22755 if ( $next_tok eq '(' ) {
22759 # underscore after file test operator is file handle
22760 if ( $tok eq '_' && $last_nonblank_type eq 'F' ) {
22764 # patch for SWITCH/CASE if 'case' and 'when are
22765 # not treated as keywords:
22769 && $brace_type[$brace_depth] eq 'switch'
22771 || ( $tok eq 'when'
22772 && $brace_type[$brace_depth] eq 'given' )
22775 $statement_type = $tok; # next '{' is block
22776 $type = 'k'; # for keyword syntax coloring
22779 # patch for SWITCH/CASE if switch and given not keywords
22780 # Switch is not a perl 5 keyword, but we will gamble
22781 # and mark switch followed by paren as a keyword. This
22782 # is only necessary to get html syntax coloring nice,
22783 # and does not commit this as being a switch/case.
22784 if ( $next_nonblank_token eq '('
22785 && ( $tok eq 'switch' || $tok eq 'given' ) )
22787 $type = 'k'; # for keyword syntax coloring
22793 ###############################################################
22794 # section 2: strings of digits
22795 ###############################################################
22796 elsif ( $pre_type eq 'd' ) {
22797 $expecting = operator_expected( $prev_type, $tok, $next_type );
22798 error_if_expecting_OPERATOR("Number")
22799 if ( $expecting == OPERATOR );
22800 my $number = scan_number();
22801 if ( !defined($number) ) {
22803 # shouldn't happen - we should always get a number
22804 warning("non-number beginning with digit--program bug\n");
22805 report_definite_bug();
22809 ###############################################################
22810 # section 3: all other tokens
22811 ###############################################################
22814 last if ( $tok eq '#' );
22815 my $code = $tokenization_code->{$tok};
22818 operator_expected( $prev_type, $tok, $next_type );
22825 # -----------------------------
22826 # end of main tokenization loop
22827 # -----------------------------
22829 if ( $i_tok >= 0 ) {
22830 $routput_token_type->[$i_tok] = $type;
22831 $routput_block_type->[$i_tok] = $block_type;
22832 $routput_container_type->[$i_tok] = $container_type;
22833 $routput_type_sequence->[$i_tok] = $type_sequence;
22836 unless ( ( $type eq 'b' ) || ( $type eq '#' ) ) {
22837 $last_last_nonblank_token = $last_nonblank_token;
22838 $last_last_nonblank_type = $last_nonblank_type;
22839 $last_last_nonblank_block_type = $last_nonblank_block_type;
22840 $last_last_nonblank_container_type = $last_nonblank_container_type;
22841 $last_last_nonblank_type_sequence = $last_nonblank_type_sequence;
22842 $last_nonblank_token = $tok;
22843 $last_nonblank_type = $type;
22844 $last_nonblank_block_type = $block_type;
22845 $last_nonblank_container_type = $container_type;
22846 $last_nonblank_type_sequence = $type_sequence;
22847 $last_nonblank_prototype = $prototype;
22850 # reset indentation level if necessary at a sub or package
22851 # in an attempt to recover from a nesting error
22852 if ( $level_in_tokenizer < 0 ) {
22853 if ( $input_line =~ /^\s*(sub|package)\s+(\w+)/ ) {
22854 reset_indentation_level(0);
22855 brace_warning("resetting level to 0 at $1 $2\n");
22859 # all done tokenizing this line ...
22860 # now prepare the final list of tokens and types
22862 my @token_type = (); # stack of output token types
22863 my @block_type = (); # stack of output code block types
22864 my @container_type = (); # stack of output code container types
22865 my @type_sequence = (); # stack of output type sequence numbers
22866 my @tokens = (); # output tokens
22867 my @levels = (); # structural brace levels of output tokens
22868 my @slevels = (); # secondary nesting levels of output tokens
22869 my @nesting_tokens = (); # string of tokens leading to this depth
22870 my @nesting_types = (); # string of token types leading to this depth
22871 my @nesting_blocks = (); # string of block types leading to this depth
22872 my @nesting_lists = (); # string of list types leading to this depth
22873 my @ci_string = (); # string needed to compute continuation indentation
22874 my @container_environment = (); # BLOCK or LIST
22875 my $container_environment = '';
22876 my $im = -1; # previous $i value
22878 my $ci_string_sum = ones_count($ci_string_in_tokenizer);
22880 # Computing Token Indentation
22882 # The final section of the tokenizer forms tokens and also computes
22883 # parameters needed to find indentation. It is much easier to do it
22884 # in the tokenizer than elsewhere. Here is a brief description of how
22885 # indentation is computed. Perl::Tidy computes indentation as the sum
22888 # (1) structural indentation, such as if/else/elsif blocks
22889 # (2) continuation indentation, such as long parameter call lists.
22891 # These are occasionally called primary and secondary indentation.
22893 # Structural indentation is introduced by tokens of type '{', although
22894 # the actual tokens might be '{', '(', or '['. Structural indentation
22895 # is of two types: BLOCK and non-BLOCK. Default structural indentation
22896 # is 4 characters if the standard indentation scheme is used.
22898 # Continuation indentation is introduced whenever a line at BLOCK level
22899 # is broken before its termination. Default continuation indentation
22900 # is 2 characters in the standard indentation scheme.
22902 # Both types of indentation may be nested arbitrarily deep and
22903 # interlaced. The distinction between the two is somewhat arbitrary.
22905 # For each token, we will define two variables which would apply if
22906 # the current statement were broken just before that token, so that
22907 # that token started a new line:
22909 # $level = the structural indentation level,
22910 # $ci_level = the continuation indentation level
22912 # The total indentation will be $level * (4 spaces) + $ci_level * (2 spaces),
22913 # assuming defaults. However, in some special cases it is customary
22914 # to modify $ci_level from this strict value.
22916 # The total structural indentation is easy to compute by adding and
22917 # subtracting 1 from a saved value as types '{' and '}' are seen. The
22918 # running value of this variable is $level_in_tokenizer.
22920 # The total continuation is much more difficult to compute, and requires
22921 # several variables. These veriables are:
22923 # $ci_string_in_tokenizer = a string of 1's and 0's indicating, for
22924 # each indentation level, if there are intervening open secondary
22925 # structures just prior to that level.
22926 # $continuation_string_in_tokenizer = a string of 1's and 0's indicating
22927 # if the last token at that level is "continued", meaning that it
22928 # is not the first token of an expression.
22929 # $nesting_block_string = a string of 1's and 0's indicating, for each
22930 # indentation level, if the level is of type BLOCK or not.
22931 # $nesting_block_flag = the most recent 1 or 0 of $nesting_block_string
22932 # $nesting_list_string = a string of 1's and 0's indicating, for each
22933 # indentation level, if it is is appropriate for list formatting.
22934 # If so, continuation indentation is used to indent long list items.
22935 # $nesting_list_flag = the most recent 1 or 0 of $nesting_list_string
22936 # @{$rslevel_stack} = a stack of total nesting depths at each
22937 # structural indentation level, where "total nesting depth" means
22938 # the nesting depth that would occur if every nesting token -- '{', '[',
22939 # and '(' -- , regardless of context, is used to compute a nesting
22942 #my $nesting_block_flag = ($nesting_block_string =~ /1$/);
22943 #my $nesting_list_flag = ($nesting_list_string =~ /1$/);
22945 my ( $ci_string_i, $level_i, $nesting_block_string_i,
22946 $nesting_list_string_i, $nesting_token_string_i,
22947 $nesting_type_string_i, );
22949 foreach $i ( @{$routput_token_list} )
22950 { # scan the list of pre-tokens indexes
22952 # self-checking for valid token types
22953 my $type = $routput_token_type->[$i];
22954 my $tok = $$rtokens[$i]; # the token, but ONLY if same as pretoken
22955 $level_i = $level_in_tokenizer;
22957 # This can happen by running perltidy on non-scripts
22958 # although it could also be bug introduced by programming change.
22959 # Perl silently accepts a 032 (^Z) and takes it as the end
22960 if ( !$is_valid_token_type{$type} ) {
22961 my $val = ord($type);
22963 "unexpected character decimal $val ($type) in script\n");
22964 $tokenizer_self->{_in_error} = 1;
22967 # ----------------------------------------------------------------
22968 # TOKEN TYPE PATCHES
22969 # output __END__, __DATA__, and format as type 'k' instead of ';'
22970 # to make html colors correct, etc.
22971 my $fix_type = $type;
22972 if ( $type eq ';' && $tok =~ /\w/ ) { $fix_type = 'k' }
22974 # output anonymous 'sub' as keyword
22975 if ( $type eq 't' && $tok eq 'sub' ) { $fix_type = 'k' }
22977 # -----------------------------------------------------------------
22979 $nesting_token_string_i = $nesting_token_string;
22980 $nesting_type_string_i = $nesting_type_string;
22981 $nesting_block_string_i = $nesting_block_string;
22982 $nesting_list_string_i = $nesting_list_string;
22984 # set primary indentation levels based on structural braces
22985 # Note: these are set so that the leading braces have a HIGHER
22986 # level than their CONTENTS, which is convenient for indentation
22987 # Also, define continuation indentation for each token.
22988 if ( $type eq '{' || $type eq 'L' ) {
22990 # use environment before updating
22991 $container_environment =
22992 $nesting_block_flag ? 'BLOCK'
22993 : $nesting_list_flag ? 'LIST'
22996 # if the difference between total nesting levels is not 1,
22997 # there are intervening non-structural nesting types between
22998 # this '{' and the previous unclosed '{'
22999 my $intervening_secondary_structure = 0;
23000 if ( @{$rslevel_stack} ) {
23001 $intervening_secondary_structure =
23002 $slevel_in_tokenizer - $rslevel_stack->[-1];
23005 # Continuation Indentation
23007 # Having tried setting continuation indentation both in the formatter and
23008 # in the tokenizer, I can say that setting it in the tokenizer is much,
23009 # much easier. The formatter already has too much to do, and can't
23010 # make decisions on line breaks without knowing what 'ci' will be at
23011 # arbitrary locations.
23013 # But a problem with setting the continuation indentation (ci) here
23014 # in the tokenizer is that we do not know where line breaks will actually
23015 # be. As a result, we don't know if we should propagate continuation
23016 # indentation to higher levels of structure.
23018 # For nesting of only structural indentation, we never need to do this.
23019 # For example, in a long if statement, like this
23021 # if ( !$output_block_type[$i]
23022 # && ($in_statement_continuation) )
23027 # the second line has ci but we do normally give the lines within the BLOCK
23028 # any ci. This would be true if we had blocks nested arbitrarily deeply.
23030 # But consider something like this, where we have created a break after
23031 # an opening paren on line 1, and the paren is not (currently) a
23032 # structural indentation token:
23034 # my $file = $menubar->Menubutton(
23035 # qw/-text File -underline 0 -menuitems/ => [
23037 # Cascade => '~View',
23041 # The second line has ci, so it would seem reasonable to propagate it
23042 # down, giving the third line 1 ci + 1 indentation. This suggests the
23043 # following rule, which is currently used to propagating ci down: if there
23044 # are any non-structural opening parens (or brackets, or braces), before
23045 # an opening structural brace, then ci is propagated down, and otherwise
23046 # not. The variable $intervening_secondary_structure contains this
23047 # information for the current token, and the string
23048 # "$ci_string_in_tokenizer" is a stack of previous values of this
23051 # save the current states
23052 push( @{$rslevel_stack}, 1 + $slevel_in_tokenizer );
23053 $level_in_tokenizer++;
23055 if ( $routput_block_type->[$i] ) {
23056 $nesting_block_flag = 1;
23057 $nesting_block_string .= '1';
23060 $nesting_block_flag = 0;
23061 $nesting_block_string .= '0';
23064 # we will use continuation indentation within containers
23065 # which are not blocks and not logical expressions
23067 if ( !$routput_block_type->[$i] ) {
23069 # propagate flag down at nested open parens
23070 if ( $routput_container_type->[$i] eq '(' ) {
23071 $bit = 1 if $nesting_list_flag;
23074 # use list continuation if not a logical grouping
23075 # /^(if|elsif|unless|while|and|or|not|&&|!|\|\||for|foreach)$/
23079 $is_logical_container{ $routput_container_type->[$i]
23083 $nesting_list_string .= $bit;
23084 $nesting_list_flag = $bit;
23086 $ci_string_in_tokenizer .=
23087 ( $intervening_secondary_structure != 0 ) ? '1' : '0';
23088 $ci_string_sum = ones_count($ci_string_in_tokenizer);
23089 $continuation_string_in_tokenizer .=
23090 ( $in_statement_continuation > 0 ) ? '1' : '0';
23092 # Sometimes we want to give an opening brace continuation indentation,
23093 # and sometimes not. For code blocks, we don't do it, so that the leading
23094 # '{' gets outdented, like this:
23096 # if ( !$output_block_type[$i]
23097 # && ($in_statement_continuation) )
23100 # For other types, we will give them continuation indentation. For example,
23101 # here is how a list looks with the opening paren indented:
23104 # ( [ "fred", "barney" ], [ "george", "jane", "elroy" ],
23105 # [ "homer", "marge", "bart" ], );
23107 # This looks best when 'ci' is one-half of the indentation (i.e., 2 and 4)
23109 my $total_ci = $ci_string_sum;
23111 !$routput_block_type->[$i] # patch: skip for BLOCK
23112 && ($in_statement_continuation)
23115 $total_ci += $in_statement_continuation
23116 unless ( $ci_string_in_tokenizer =~ /1$/ );
23119 $ci_string_i = $total_ci;
23120 $in_statement_continuation = 0;
23123 elsif ( $type eq '}' || $type eq 'R' ) {
23125 # only a nesting error in the script would prevent popping here
23126 if ( @{$rslevel_stack} > 1 ) { pop( @{$rslevel_stack} ); }
23128 $level_i = --$level_in_tokenizer;
23130 # restore previous level values
23131 if ( length($nesting_block_string) > 1 )
23132 { # true for valid script
23133 chop $nesting_block_string;
23134 $nesting_block_flag = ( $nesting_block_string =~ /1$/ );
23135 chop $nesting_list_string;
23136 $nesting_list_flag = ( $nesting_list_string =~ /1$/ );
23138 chop $ci_string_in_tokenizer;
23139 $ci_string_sum = ones_count($ci_string_in_tokenizer);
23141 $in_statement_continuation =
23142 chop $continuation_string_in_tokenizer;
23144 # zero continuation flag at terminal BLOCK '}' which
23145 # ends a statement.
23146 if ( $routput_block_type->[$i] ) {
23148 # ...These include non-anonymous subs
23149 # note: could be sub ::abc { or sub 'abc
23150 if ( $routput_block_type->[$i] =~ m/^sub\s*/gc ) {
23152 # note: older versions of perl require the /gc modifier
23153 # here or else the \G does not work.
23154 if ( $routput_block_type->[$i] =~ /\G('|::|\w)/gc )
23156 $in_statement_continuation = 0;
23160 # ...and include all block types except user subs with
23161 # block prototypes and these: (sort|grep|map|do|eval)
23162 # /^(\}|\{|BEGIN|END|CHECK|INIT|AUTOLOAD|DESTROY|continue|;|if|elsif|else|unless|while|until|for|foreach)$/
23164 $is_zero_continuation_block_type{
23165 $routput_block_type->[$i] } )
23167 $in_statement_continuation = 0;
23170 # ..but these are not terminal types:
23171 # /^(sort|grep|map|do|eval)$/ )
23173 $is_not_zero_continuation_block_type{
23174 $routput_block_type->[$i] } )
23178 # ..and a block introduced by a label
23179 # /^\w+\s*:$/gc ) {
23180 elsif ( $routput_block_type->[$i] =~ /:$/ ) {
23181 $in_statement_continuation = 0;
23184 # user function with block prototype
23186 $in_statement_continuation = 0;
23190 # If we are in a list, then
23191 # we must set continuatoin indentation at the closing
23192 # paren of something like this (paren after $check):
23195 # ( not defined $check )
23197 # or $check eq "new"
23198 # or $check eq "old",
23200 elsif ( $tok eq ')' ) {
23201 $in_statement_continuation = 1
23202 if $routput_container_type->[$i] =~ /^[;,\{\}]$/;
23206 # use environment after updating
23207 $container_environment =
23208 $nesting_block_flag ? 'BLOCK'
23209 : $nesting_list_flag ? 'LIST'
23211 $ci_string_i = $ci_string_sum + $in_statement_continuation;
23212 $nesting_block_string_i = $nesting_block_string;
23213 $nesting_list_string_i = $nesting_list_string;
23216 # not a structural indentation type..
23219 $container_environment =
23220 $nesting_block_flag ? 'BLOCK'
23221 : $nesting_list_flag ? 'LIST'
23224 # zero the continuation indentation at certain tokens so
23225 # that they will be at the same level as its container. For
23226 # commas, this simplifies the -lp indentation logic, which
23227 # counts commas. For ?: it makes them stand out.
23228 if ($nesting_list_flag) {
23229 if ( $type =~ /^[,\?\:]$/ ) {
23230 $in_statement_continuation = 0;
23234 # be sure binary operators get continuation indentation
23236 $container_environment
23237 && ( $type eq 'k' && $is_binary_keyword{$tok}
23238 || $is_binary_type{$type} )
23241 $in_statement_continuation = 1;
23244 # continuation indentation is sum of any open ci from previous
23245 # levels plus the current level
23246 $ci_string_i = $ci_string_sum + $in_statement_continuation;
23248 # update continuation flag ...
23249 # if this isn't a blank or comment..
23250 if ( $type ne 'b' && $type ne '#' ) {
23252 # and we are in a BLOCK
23253 if ($nesting_block_flag) {
23255 # the next token after a ';' and label starts a new stmt
23256 if ( $type eq ';' || $type eq 'J' ) {
23257 $in_statement_continuation = 0;
23260 # otherwise, we are continuing the current statement
23262 $in_statement_continuation = 1;
23266 # if we are not in a BLOCK..
23269 # do not use continuation indentation if not list
23270 # environment (could be within if/elsif clause)
23271 if ( !$nesting_list_flag ) {
23272 $in_statement_continuation = 0;
23275 # otherwise, the next token after a ',' starts a new term
23276 elsif ( $type eq ',' ) {
23277 $in_statement_continuation = 0;
23280 # otherwise, we are continuing the current term
23282 $in_statement_continuation = 1;
23288 if ( $level_in_tokenizer < 0 ) {
23289 unless ( $tokenizer_self->{_saw_negative_indentation} ) {
23290 $tokenizer_self->{_saw_negative_indentation} = 1;
23291 warning("Starting negative indentation\n");
23295 # set secondary nesting levels based on all continment token types
23296 # Note: these are set so that the nesting depth is the depth
23297 # of the PREVIOUS TOKEN, which is convenient for setting
23298 # the stength of token bonds
23299 my $slevel_i = $slevel_in_tokenizer;
23302 if ( $is_opening_type{$type} ) {
23303 $slevel_in_tokenizer++;
23304 $nesting_token_string .= $tok;
23305 $nesting_type_string .= $type;
23309 elsif ( $is_closing_type{$type} ) {
23310 $slevel_in_tokenizer--;
23311 my $char = chop $nesting_token_string;
23313 if ( $char ne $matching_start_token{$tok} ) {
23314 $nesting_token_string .= $char . $tok;
23315 $nesting_type_string .= $type;
23318 chop $nesting_type_string;
23322 push( @block_type, $routput_block_type->[$i] );
23323 push( @ci_string, $ci_string_i );
23324 push( @container_environment, $container_environment );
23325 push( @container_type, $routput_container_type->[$i] );
23326 push( @levels, $level_i );
23327 push( @nesting_tokens, $nesting_token_string_i );
23328 push( @nesting_types, $nesting_type_string_i );
23329 push( @slevels, $slevel_i );
23330 push( @token_type, $fix_type );
23331 push( @type_sequence, $routput_type_sequence->[$i] );
23332 push( @nesting_blocks, $nesting_block_string );
23333 push( @nesting_lists, $nesting_list_string );
23335 # now form the previous token
23338 $$rtoken_map[$i] - $$rtoken_map[$im]; # how many characters
23342 substr( $input_line, $$rtoken_map[$im], $num ) );
23348 $num = length($input_line) - $$rtoken_map[$im]; # make the last token
23350 push( @tokens, substr( $input_line, $$rtoken_map[$im], $num ) );
23353 $tokenizer_self->{_in_attribute_list} = $in_attribute_list;
23354 $tokenizer_self->{_in_quote} = $in_quote;
23355 $tokenizer_self->{_quote_target} =
23356 $in_quote ? matching_end_token($quote_character) : "";
23357 $tokenizer_self->{_rhere_target_list} = $rhere_target_list;
23359 $line_of_tokens->{_rtoken_type} = \@token_type;
23360 $line_of_tokens->{_rtokens} = \@tokens;
23361 $line_of_tokens->{_rblock_type} = \@block_type;
23362 $line_of_tokens->{_rcontainer_type} = \@container_type;
23363 $line_of_tokens->{_rcontainer_environment} = \@container_environment;
23364 $line_of_tokens->{_rtype_sequence} = \@type_sequence;
23365 $line_of_tokens->{_rlevels} = \@levels;
23366 $line_of_tokens->{_rslevels} = \@slevels;
23367 $line_of_tokens->{_rnesting_tokens} = \@nesting_tokens;
23368 $line_of_tokens->{_rci_levels} = \@ci_string;
23369 $line_of_tokens->{_rnesting_blocks} = \@nesting_blocks;
23373 } # end tokenize_this_line
23375 #########i#############################################################
23376 # Tokenizer routines which assist in identifying token types
23377 #######################################################################
23379 sub operator_expected {
23381 # Many perl symbols have two or more meanings. For example, '<<'
23382 # can be a shift operator or a here-doc operator. The
23383 # interpretation of these symbols depends on the current state of
23384 # the tokenizer, which may either be expecting a term or an
23385 # operator. For this example, a << would be a shift if an operator
23386 # is expected, and a here-doc if a term is expected. This routine
23387 # is called to make this decision for any current token. It returns
23388 # one of three possible values:
23390 # OPERATOR - operator expected (or at least, not a term)
23391 # UNKNOWN - can't tell
23392 # TERM - a term is expected (or at least, not an operator)
23394 # The decision is based on what has been seen so far. This
23395 # information is stored in the "$last_nonblank_type" and
23396 # "$last_nonblank_token" variables. For example, if the
23397 # $last_nonblank_type is '=~', then we are expecting a TERM, whereas
23398 # if $last_nonblank_type is 'n' (numeric), we are expecting an
23401 # If a UNKNOWN is returned, the calling routine must guess. A major
23402 # goal of this tokenizer is to minimize the possiblity of returning
23403 # UNKNOWN, because a wrong guess can spoil the formatting of a
23406 # adding NEW_TOKENS: it is critically important that this routine be
23407 # updated to allow it to determine if an operator or term is to be
23408 # expected after the new token. Doing this simply involves adding
23409 # the new token character to one of the regexes in this routine or
23410 # to one of the hash lists
23411 # that it uses, which are initialized in the BEGIN section.
23412 # USES GLOBAL VARIABLES: $last_nonblank_type, $last_nonblank_token,
23415 my ( $prev_type, $tok, $next_type ) = @_;
23417 my $op_expected = UNKNOWN;
23419 #print "tok=$tok last type=$last_nonblank_type last tok=$last_nonblank_token\n";
23421 # Note: function prototype is available for token type 'U' for future
23422 # program development. It contains the leading and trailing parens,
23423 # and no blanks. It might be used to eliminate token type 'C', for
23424 # example (prototype = '()'). Thus:
23425 # if ($last_nonblank_type eq 'U') {
23426 # print "previous token=$last_nonblank_token type=$last_nonblank_type prototype=$last_nonblank_prototype\n";
23429 # A possible filehandle (or object) requires some care...
23430 if ( $last_nonblank_type eq 'Z' ) {
23433 if ( $last_nonblank_token =~ /^[A-Za-z_]/ ) {
23434 $op_expected = UNKNOWN;
23437 # For possible file handle like "$a", Perl uses weird parsing rules.
23439 # print $a/2,"/hi"; - division
23440 # print $a / 2,"/hi"; - division
23441 # print $a/ 2,"/hi"; - division
23442 # print $a /2,"/hi"; - pattern (and error)!
23443 elsif ( ( $prev_type eq 'b' ) && ( $next_type ne 'b' ) ) {
23444 $op_expected = TERM;
23447 # Note when an operation is being done where a
23448 # filehandle might be expected, since a change in whitespace
23449 # could change the interpretation of the statement.
23451 if ( $tok =~ /^([x\/\+\-\*\%\&\.\?\<]|\>\>)$/ ) {
23452 complain("operator in print statement not recommended\n");
23453 $op_expected = OPERATOR;
23458 # handle something after 'do' and 'eval'
23459 elsif ( $is_block_operator{$last_nonblank_token} ) {
23461 # something like $a = eval "expression";
23463 if ( $last_nonblank_type eq 'k' ) {
23464 $op_expected = TERM; # expression or list mode following keyword
23467 # something like $a = do { BLOCK } / 2;
23470 $op_expected = OPERATOR; # block mode following }
23474 # handle bare word..
23475 elsif ( $last_nonblank_type eq 'w' ) {
23477 # unfortunately, we can't tell what type of token to expect next
23478 # after most bare words
23479 $op_expected = UNKNOWN;
23482 # operator, but not term possible after these types
23483 # Note: moved ')' from type to token because parens in list context
23484 # get marked as '{' '}' now. This is a minor glitch in the following:
23485 # my %opts = (ref $_[0] eq 'HASH') ? %{shift()} : ();
23487 elsif (( $last_nonblank_type =~ /^[\]RnviQh]$/ )
23488 || ( $last_nonblank_token =~ /^(\)|\$|\-\>)/ ) )
23490 $op_expected = OPERATOR;
23492 # in a 'use' statement, numbers and v-strings are not true
23493 # numbers, so to avoid incorrect error messages, we will
23494 # mark them as unknown for now (use.t)
23495 # TODO: it would be much nicer to create a new token V for VERSION
23496 # number in a use statement. Then this could be a check on type V
23497 # and related patches which change $statement_type for '=>'
23498 # and ',' could be removed. Further, it would clean things up to
23499 # scan the 'use' statement with a separate subroutine.
23500 if ( ( $statement_type eq 'use' )
23501 && ( $last_nonblank_type =~ /^[nv]$/ ) )
23503 $op_expected = UNKNOWN;
23507 # no operator after many keywords, such as "die", "warn", etc
23508 elsif ( $expecting_term_token{$last_nonblank_token} ) {
23510 # patch for dor.t (defined or).
23511 # perl functions which may be unary operators
23512 # TODO: This list is incomplete, and these should be put
23515 && $next_type eq '/'
23516 && $last_nonblank_type eq 'k'
23517 && $last_nonblank_token =~ /^eof|undef|shift|pop$/ )
23519 $op_expected = OPERATOR;
23522 $op_expected = TERM;
23526 # no operator after things like + - ** (i.e., other operators)
23527 elsif ( $expecting_term_types{$last_nonblank_type} ) {
23528 $op_expected = TERM;
23531 # a few operators, like "time", have an empty prototype () and so
23532 # take no parameters but produce a value to operate on
23533 elsif ( $expecting_operator_token{$last_nonblank_token} ) {
23534 $op_expected = OPERATOR;
23537 # post-increment and decrement produce values to be operated on
23538 elsif ( $expecting_operator_types{$last_nonblank_type} ) {
23539 $op_expected = OPERATOR;
23542 # no value to operate on after sub block
23543 elsif ( $last_nonblank_token =~ /^sub\s/ ) { $op_expected = TERM; }
23545 # a right brace here indicates the end of a simple block.
23546 # all non-structural right braces have type 'R'
23547 # all braces associated with block operator keywords have been given those
23548 # keywords as "last_nonblank_token" and caught above.
23549 # (This statement is order dependent, and must come after checking
23550 # $last_nonblank_token).
23551 elsif ( $last_nonblank_type eq '}' ) {
23553 # patch for dor.t (defined or).
23555 && $next_type eq '/'
23556 && $last_nonblank_token eq ']' )
23558 $op_expected = OPERATOR;
23561 $op_expected = TERM;
23565 # something else..what did I forget?
23568 # collecting diagnostics on unknown operator types..see what was missed
23569 $op_expected = UNKNOWN;
23571 "OP: unknown after type=$last_nonblank_type token=$last_nonblank_token\n"
23575 TOKENIZER_DEBUG_FLAG_EXPECT && do {
23577 "EXPECT: returns $op_expected for last type $last_nonblank_type token $last_nonblank_token\n";
23579 return $op_expected;
23582 sub new_statement_ok {
23584 # return true if the current token can start a new statement
23585 # USES GLOBAL VARIABLES: $last_nonblank_type
23587 return label_ok() # a label would be ok here
23589 || $last_nonblank_type eq 'J'; # or we follow a label
23595 # Decide if a bare word followed by a colon here is a label
23596 # USES GLOBAL VARIABLES: $last_nonblank_token, $last_nonblank_type,
23597 # $brace_depth, @brace_type
23599 # if it follows an opening or closing code block curly brace..
23600 if ( ( $last_nonblank_token eq '{' || $last_nonblank_token eq '}' )
23601 && $last_nonblank_type eq $last_nonblank_token )
23604 # it is a label if and only if the curly encloses a code block
23605 return $brace_type[$brace_depth];
23608 # otherwise, it is a label if and only if it follows a ';'
23611 return ( $last_nonblank_type eq ';' );
23615 sub code_block_type {
23617 # Decide if this is a block of code, and its type.
23618 # Must be called only when $type = $token = '{'
23619 # The problem is to distinguish between the start of a block of code
23620 # and the start of an anonymous hash reference
23621 # Returns "" if not code block, otherwise returns 'last_nonblank_token'
23622 # to indicate the type of code block. (For example, 'last_nonblank_token'
23623 # might be 'if' for an if block, 'else' for an else block, etc).
23624 # USES GLOBAL VARIABLES: $last_nonblank_token, $last_nonblank_type,
23625 # $last_nonblank_block_type, $brace_depth, @brace_type
23627 # handle case of multiple '{'s
23629 # print "BLOCK_TYPE EXAMINING: type=$last_nonblank_type tok=$last_nonblank_token\n";
23631 my ( $i, $rtokens, $rtoken_type, $max_token_index ) = @_;
23632 if ( $last_nonblank_token eq '{'
23633 && $last_nonblank_type eq $last_nonblank_token )
23636 # opening brace where a statement may appear is probably
23637 # a code block but might be and anonymous hash reference
23638 if ( $brace_type[$brace_depth] ) {
23639 return decide_if_code_block( $i, $rtokens, $rtoken_type,
23640 $max_token_index );
23643 # cannot start a code block within an anonymous hash
23649 elsif ( $last_nonblank_token eq ';' ) {
23651 # an opening brace where a statement may appear is probably
23652 # a code block but might be and anonymous hash reference
23653 return decide_if_code_block( $i, $rtokens, $rtoken_type,
23654 $max_token_index );
23657 # handle case of '}{'
23658 elsif ($last_nonblank_token eq '}'
23659 && $last_nonblank_type eq $last_nonblank_token )
23662 # a } { situation ...
23663 # could be hash reference after code block..(blktype1.t)
23664 if ($last_nonblank_block_type) {
23665 return decide_if_code_block( $i, $rtokens, $rtoken_type,
23666 $max_token_index );
23669 # must be a block if it follows a closing hash reference
23671 return $last_nonblank_token;
23675 # NOTE: braces after type characters start code blocks, but for
23676 # simplicity these are not identified as such. See also
23677 # sub is_non_structural_brace.
23678 # elsif ( $last_nonblank_type eq 't' ) {
23679 # return $last_nonblank_token;
23682 # brace after label:
23683 elsif ( $last_nonblank_type eq 'J' ) {
23684 return $last_nonblank_token;
23687 # otherwise, look at previous token. This must be a code block if
23688 # it follows any of these:
23689 # /^(BEGIN|END|CHECK|INIT|AUTOLOAD|DESTROY|continue|if|elsif|else|unless|do|while|until|eval|for|foreach|map|grep|sort)$/
23690 elsif ( $is_code_block_token{$last_nonblank_token} ) {
23691 return $last_nonblank_token;
23694 # or a sub definition
23695 elsif ( ( $last_nonblank_type eq 'i' || $last_nonblank_type eq 't' )
23696 && $last_nonblank_token =~ /^sub\b/ )
23698 return $last_nonblank_token;
23701 # user-defined subs with block parameters (like grep/map/eval)
23702 elsif ( $last_nonblank_type eq 'G' ) {
23703 return $last_nonblank_token;
23707 elsif ( $last_nonblank_type eq 'w' ) {
23708 return decide_if_code_block( $i, $rtokens, $rtoken_type,
23709 $max_token_index );
23712 # anything else must be anonymous hash reference
23718 sub decide_if_code_block {
23720 # USES GLOBAL VARIABLES: $last_nonblank_token
23721 my ( $i, $rtokens, $rtoken_type, $max_token_index ) = @_;
23722 my ( $next_nonblank_token, $i_next ) =
23723 find_next_nonblank_token( $i, $rtokens, $max_token_index );
23725 # we are at a '{' where a statement may appear.
23726 # We must decide if this brace starts an anonymous hash or a code
23728 # return "" if anonymous hash, and $last_nonblank_token otherwise
23730 # initialize to be code BLOCK
23731 my $code_block_type = $last_nonblank_token;
23733 # Check for the common case of an empty anonymous hash reference:
23734 # Maybe something like sub { { } }
23735 if ( $next_nonblank_token eq '}' ) {
23736 $code_block_type = "";
23741 # To guess if this '{' is an anonymous hash reference, look ahead
23742 # and test as follows:
23744 # it is a hash reference if next come:
23745 # - a string or digit followed by a comma or =>
23746 # - bareword followed by =>
23747 # otherwise it is a code block
23749 # Examples of anonymous hash ref:
23753 # Examples of code blocks:
23754 # {1; print "hello\n", 1;}
23757 # We are only going to look ahead one more (nonblank/comment) line.
23758 # Strange formatting could cause a bad guess, but that's unlikely.
23759 my @pre_types = @$rtoken_type[ $i + 1 .. $max_token_index ];
23760 my @pre_tokens = @$rtokens[ $i + 1 .. $max_token_index ];
23761 my ( $rpre_tokens, $rpre_types ) =
23762 peek_ahead_for_n_nonblank_pre_tokens(20); # 20 is arbitrary but
23763 # generous, and prevents
23765 # time in mangled files
23766 if ( defined($rpre_types) && @$rpre_types ) {
23767 push @pre_types, @$rpre_types;
23768 push @pre_tokens, @$rpre_tokens;
23771 # put a sentinal token to simplify stopping the search
23772 push @pre_types, '}';
23775 $jbeg = 1 if $pre_types[0] eq 'b';
23777 # first look for one of these
23779 # - bareword with leading -
23783 if ( $pre_types[$j] =~ /^[\'\"]/ ) {
23785 # find the closing quote; don't worry about escapes
23786 my $quote_mark = $pre_types[$j];
23787 for ( my $k = $j + 1 ; $k < $#pre_types ; $k++ ) {
23788 if ( $pre_types[$k] eq $quote_mark ) {
23790 my $next = $pre_types[$j];
23795 elsif ( $pre_types[$j] eq 'd' ) {
23798 elsif ( $pre_types[$j] eq 'w' ) {
23799 unless ( $is_keyword{ $pre_tokens[$j] } ) {
23803 elsif ( $pre_types[$j] eq '-' && $pre_types[ ++$j ] eq 'w' ) {
23806 if ( $j > $jbeg ) {
23808 $j++ if $pre_types[$j] eq 'b';
23810 # it's a hash ref if a comma or => follow next
23811 if ( $pre_types[$j] eq ','
23812 || ( $pre_types[$j] eq '=' && $pre_types[ ++$j ] eq '>' ) )
23814 $code_block_type = "";
23819 return $code_block_type;
23824 # report unexpected token type and show where it is
23825 # USES GLOBAL VARIABLES: $tokenizer_self
23826 my ( $found, $expecting, $i_tok, $last_nonblank_i, $rpretoken_map,
23827 $rpretoken_type, $input_line )
23830 if ( ++$tokenizer_self->{_unexpected_error_count} <= MAX_NAG_MESSAGES ) {
23831 my $msg = "found $found where $expecting expected";
23832 my $pos = $$rpretoken_map[$i_tok];
23833 interrupt_logfile();
23834 my $input_line_number = $tokenizer_self->{_last_line_number};
23835 my ( $offset, $numbered_line, $underline ) =
23836 make_numbered_line( $input_line_number, $input_line, $pos );
23837 $underline = write_on_underline( $underline, $pos - $offset, '^' );
23840 if ( ( $i_tok > 0 ) && ( $last_nonblank_i >= 0 ) ) {
23841 my $pos_prev = $$rpretoken_map[$last_nonblank_i];
23843 if ( $$rpretoken_type[ $i_tok - 1 ] eq 'b' ) {
23844 $num = $$rpretoken_map[ $i_tok - 1 ] - $pos_prev;
23847 $num = $pos - $pos_prev;
23849 if ( $num > 40 ) { $num = 40; $pos_prev = $pos - 40; }
23852 write_on_underline( $underline, $pos_prev - $offset, '-' x $num );
23853 $trailer = " (previous token underlined)";
23855 warning( $numbered_line . "\n" );
23856 warning( $underline . "\n" );
23857 warning( $msg . $trailer . "\n" );
23862 sub is_non_structural_brace {
23864 # Decide if a brace or bracket is structural or non-structural
23865 # by looking at the previous token and type
23866 # USES GLOBAL VARIABLES: $last_nonblank_type, $last_nonblank_token
23868 # EXPERIMENTAL: Mark slices as structural; idea was to improve formatting.
23869 # Tentatively deactivated because it caused the wrong operator expectation
23871 # $user = @vars[1] / 100;
23872 # Must update sub operator_expected before re-implementing.
23873 # if ( $last_nonblank_type eq 'i' && $last_nonblank_token =~ /^@/ ) {
23877 # NOTE: braces after type characters start code blocks, but for
23878 # simplicity these are not identified as such. See also
23879 # sub code_block_type
23880 # if ($last_nonblank_type eq 't') {return 0}
23882 # otherwise, it is non-structural if it is decorated
23883 # by type information.
23884 # For example, the '{' here is non-structural: ${xxx}
23886 $last_nonblank_token =~ /^([\$\@\*\&\%\)]|->|::)/
23888 # or if we follow a hash or array closing curly brace or bracket
23889 # For example, the second '{' in this is non-structural: $a{'x'}{'y'}
23890 # because the first '}' would have been given type 'R'
23891 || $last_nonblank_type =~ /^([R\]])$/
23895 #########i#############################################################
23896 # Tokenizer routines for tracking container nesting depths
23897 #######################################################################
23899 # The following routines keep track of nesting depths of the nesting
23900 # types, ( [ { and ?. This is necessary for determining the indentation
23901 # level, and also for debugging programs. Not only do they keep track of
23902 # nesting depths of the individual brace types, but they check that each
23903 # of the other brace types is balanced within matching pairs. For
23904 # example, if the program sees this sequence:
23908 # then it can determine that there is an extra left paren somewhere
23909 # between the { and the }. And so on with every other possible
23910 # combination of outer and inner brace types. For another
23915 # which has an extra ] within the parens.
23917 # The brace types have indexes 0 .. 3 which are indexes into
23920 # The pair ? : are treated as just another nesting type, with ? acting
23921 # as the opening brace and : acting as the closing brace.
23925 # $depth_array[$a][$b][ $current_depth[$a] ] = $current_depth[$b];
23927 # saves the nesting depth of brace type $b (where $b is either of the other
23928 # nesting types) when brace type $a enters a new depth. When this depth
23929 # decreases, a check is made that the current depth of brace types $b is
23930 # unchanged, or otherwise there must have been an error. This can
23931 # be very useful for localizing errors, particularly when perl runs to
23932 # the end of a large file (such as this one) and announces that there
23933 # is a problem somewhere.
23935 # A numerical sequence number is maintained for every nesting type,
23936 # so that each matching pair can be uniquely identified in a simple
23939 sub increase_nesting_depth {
23940 my ( $a, $pos ) = @_;
23942 # USES GLOBAL VARIABLES: $tokenizer_self, @current_depth,
23943 # @current_sequence_number, @depth_array, @starting_line_of_current_depth
23945 $current_depth[$a]++;
23946 my $input_line_number = $tokenizer_self->{_last_line_number};
23947 my $input_line = $tokenizer_self->{_line_text};
23949 # Sequence numbers increment by number of items. This keeps
23950 # a unique set of numbers but still allows the relative location
23951 # of any type to be determined.
23952 $nesting_sequence_number[$a] += scalar(@closing_brace_names);
23953 my $seqno = $nesting_sequence_number[$a];
23954 $current_sequence_number[$a][ $current_depth[$a] ] = $seqno;
23956 $starting_line_of_current_depth[$a][ $current_depth[$a] ] =
23957 [ $input_line_number, $input_line, $pos ];
23959 for $b ( 0 .. $#closing_brace_names ) {
23960 next if ( $b == $a );
23961 $depth_array[$a][$b][ $current_depth[$a] ] = $current_depth[$b];
23966 sub decrease_nesting_depth {
23968 my ( $a, $pos ) = @_;
23970 # USES GLOBAL VARIABLES: $tokenizer_self, @current_depth,
23971 # @current_sequence_number, @depth_array, @starting_line_of_current_depth
23974 my $input_line_number = $tokenizer_self->{_last_line_number};
23975 my $input_line = $tokenizer_self->{_line_text};
23977 if ( $current_depth[$a] > 0 ) {
23979 $seqno = $current_sequence_number[$a][ $current_depth[$a] ];
23981 # check that any brace types $b contained within are balanced
23982 for $b ( 0 .. $#closing_brace_names ) {
23983 next if ( $b == $a );
23985 unless ( $depth_array[$a][$b][ $current_depth[$a] ] ==
23986 $current_depth[$b] )
23989 $current_depth[$b] -
23990 $depth_array[$a][$b][ $current_depth[$a] ];
23992 # don't whine too many times
23993 my $saw_brace_error = get_saw_brace_error();
23995 $saw_brace_error <= MAX_NAG_MESSAGES
23997 # if too many closing types have occured, we probably
23998 # already caught this error
23999 && ( ( $diff > 0 ) || ( $saw_brace_error <= 0 ) )
24002 interrupt_logfile();
24004 $starting_line_of_current_depth[$a][ $current_depth[$a] ];
24006 my $rel = [ $input_line_number, $input_line, $pos ];
24010 if ( $diff == 1 || $diff == -1 ) {
24018 ? $opening_brace_names[$b]
24019 : $closing_brace_names[$b];
24020 write_error_indicator_pair( @$rsl, '^' );
24022 Found $diff extra $bname$ess between $opening_brace_names[$a] on line $sl and $closing_brace_names[$a] on line $el
24027 $starting_line_of_current_depth[$b]
24028 [ $current_depth[$b] ];
24031 " The most recent un-matched $bname is on line $ml\n";
24032 write_error_indicator_pair( @$rml, '^' );
24034 write_error_indicator_pair( @$rel, '^' );
24038 increment_brace_error();
24041 $current_depth[$a]--;
24045 my $saw_brace_error = get_saw_brace_error();
24046 if ( $saw_brace_error <= MAX_NAG_MESSAGES ) {
24048 There is no previous $opening_brace_names[$a] to match a $closing_brace_names[$a] on line $input_line_number
24050 indicate_error( $msg, $input_line_number, $input_line, $pos, '^' );
24052 increment_brace_error();
24057 sub check_final_nesting_depths {
24060 # USES GLOBAL VARIABLES: @current_depth, @starting_line_of_current_depth
24062 for $a ( 0 .. $#closing_brace_names ) {
24064 if ( $current_depth[$a] ) {
24065 my $rsl = $starting_line_of_current_depth[$a][ $current_depth[$a] ];
24068 Final nesting depth of $opening_brace_names[$a]s is $current_depth[$a]
24069 The most recent un-matched $opening_brace_names[$a] is on line $sl
24071 indicate_error( $msg, @$rsl, '^' );
24072 increment_brace_error();
24077 #########i#############################################################
24078 # Tokenizer routines for looking ahead in input stream
24079 #######################################################################
24081 sub peek_ahead_for_n_nonblank_pre_tokens {
24083 # returns next n pretokens if they exist
24084 # returns undef's if hits eof without seeing any pretokens
24085 # USES GLOBAL VARIABLES: $tokenizer_self
24086 my $max_pretokens = shift;
24089 my ( $rpre_tokens, $rmap, $rpre_types );
24091 while ( $line = $tokenizer_self->{_line_buffer_object}->peek_ahead( $i++ ) )
24093 $line =~ s/^\s*//; # trim leading blanks
24094 next if ( length($line) <= 0 ); # skip blank
24095 next if ( $line =~ /^#/ ); # skip comment
24096 ( $rpre_tokens, $rmap, $rpre_types ) =
24097 pre_tokenize( $line, $max_pretokens );
24100 return ( $rpre_tokens, $rpre_types );
24103 # look ahead for next non-blank, non-comment line of code
24104 sub peek_ahead_for_nonblank_token {
24106 # USES GLOBAL VARIABLES: $tokenizer_self
24107 my ( $rtokens, $max_token_index ) = @_;
24111 while ( $line = $tokenizer_self->{_line_buffer_object}->peek_ahead( $i++ ) )
24113 $line =~ s/^\s*//; # trim leading blanks
24114 next if ( length($line) <= 0 ); # skip blank
24115 next if ( $line =~ /^#/ ); # skip comment
24116 my ( $rtok, $rmap, $rtype ) =
24117 pre_tokenize( $line, 2 ); # only need 2 pre-tokens
24118 my $j = $max_token_index + 1;
24121 foreach $tok (@$rtok) {
24122 last if ( $tok =~ "\n" );
24123 $$rtokens[ ++$j ] = $tok;
24130 #########i#############################################################
24131 # Tokenizer guessing routines for ambiguous situations
24132 #######################################################################
24134 sub guess_if_pattern_or_conditional {
24136 # this routine is called when we have encountered a ? following an
24137 # unknown bareword, and we must decide if it starts a pattern or not
24138 # input parameters:
24139 # $i - token index of the ? starting possible pattern
24140 # output parameters:
24141 # $is_pattern = 0 if probably not pattern, =1 if probably a pattern
24142 # msg = a warning or diagnostic message
24143 # USES GLOBAL VARIABLES: $last_nonblank_token
24144 my ( $i, $rtokens, $rtoken_map, $max_token_index ) = @_;
24145 my $is_pattern = 0;
24146 my $msg = "guessing that ? after $last_nonblank_token starts a ";
24148 if ( $i >= $max_token_index ) {
24149 $msg .= "conditional (no end to pattern found on the line)\n";
24154 my $next_token = $$rtokens[$i]; # first token after ?
24156 # look for a possible ending ? on this line..
24158 my $quote_depth = 0;
24159 my $quote_character = '';
24163 $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
24166 = follow_quoted_string( $ibeg, $in_quote, $rtokens, $quote_character,
24167 $quote_pos, $quote_depth, $max_token_index );
24171 # we didn't find an ending ? on this line,
24172 # so we bias towards conditional
24174 $msg .= "conditional (no ending ? on this line)\n";
24176 # we found an ending ?, so we bias towards a pattern
24180 if ( pattern_expected( $i, $rtokens, $max_token_index ) >= 0 ) {
24182 $msg .= "pattern (found ending ? and pattern expected)\n";
24185 $msg .= "pattern (uncertain, but found ending ?)\n";
24189 return ( $is_pattern, $msg );
24192 sub guess_if_pattern_or_division {
24194 # this routine is called when we have encountered a / following an
24195 # unknown bareword, and we must decide if it starts a pattern or is a
24197 # input parameters:
24198 # $i - token index of the / starting possible pattern
24199 # output parameters:
24200 # $is_pattern = 0 if probably division, =1 if probably a pattern
24201 # msg = a warning or diagnostic message
24202 # USES GLOBAL VARIABLES: $last_nonblank_token
24203 my ( $i, $rtokens, $rtoken_map, $max_token_index ) = @_;
24204 my $is_pattern = 0;
24205 my $msg = "guessing that / after $last_nonblank_token starts a ";
24207 if ( $i >= $max_token_index ) {
24208 "division (no end to pattern found on the line)\n";
24212 my $divide_expected =
24213 numerator_expected( $i, $rtokens, $max_token_index );
24215 my $next_token = $$rtokens[$i]; # first token after slash
24217 # look for a possible ending / on this line..
24219 my $quote_depth = 0;
24220 my $quote_character = '';
24224 $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
24227 = follow_quoted_string( $ibeg, $in_quote, $rtokens, $quote_character,
24228 $quote_pos, $quote_depth, $max_token_index );
24232 # we didn't find an ending / on this line,
24233 # so we bias towards division
24234 if ( $divide_expected >= 0 ) {
24236 $msg .= "division (no ending / on this line)\n";
24239 $msg = "multi-line pattern (division not possible)\n";
24245 # we found an ending /, so we bias towards a pattern
24248 if ( pattern_expected( $i, $rtokens, $max_token_index ) >= 0 ) {
24250 if ( $divide_expected >= 0 ) {
24252 if ( $i - $ibeg > 60 ) {
24253 $msg .= "division (matching / too distant)\n";
24257 $msg .= "pattern (but division possible too)\n";
24263 $msg .= "pattern (division not possible)\n";
24268 if ( $divide_expected >= 0 ) {
24270 $msg .= "division (pattern not possible)\n";
24275 "pattern (uncertain, but division would not work here)\n";
24280 return ( $is_pattern, $msg );
24283 # try to resolve here-doc vs. shift by looking ahead for
24284 # non-code or the end token (currently only looks for end token)
24285 # returns 1 if it is probably a here doc, 0 if not
24286 sub guess_if_here_doc {
24288 # This is how many lines we will search for a target as part of the
24289 # guessing strategy. It is a constant because there is probably
24290 # little reason to change it.
24291 # USES GLOBAL VARIABLES: $tokenizer_self, $current_package
24293 use constant HERE_DOC_WINDOW => 40;
24295 my $next_token = shift;
24296 my $here_doc_expected = 0;
24299 my $msg = "checking <<";
24301 while ( $line = $tokenizer_self->{_line_buffer_object}->peek_ahead( $k++ ) )
24305 if ( $line =~ /^$next_token$/ ) {
24306 $msg .= " -- found target $next_token ahead $k lines\n";
24307 $here_doc_expected = 1; # got it
24310 last if ( $k >= HERE_DOC_WINDOW );
24313 unless ($here_doc_expected) {
24315 if ( !defined($line) ) {
24316 $here_doc_expected = -1; # hit eof without seeing target
24317 $msg .= " -- must be shift; target $next_token not in file\n";
24320 else { # still unsure..taking a wild guess
24322 if ( !$is_constant{$current_package}{$next_token} ) {
24323 $here_doc_expected = 1;
24325 " -- guessing it's a here-doc ($next_token not a constant)\n";
24329 " -- guessing it's a shift ($next_token is a constant)\n";
24333 write_logfile_entry($msg);
24334 return $here_doc_expected;
24337 #########i#############################################################
24338 # Tokenizer Routines for scanning identifiers and related items
24339 #######################################################################
24341 sub scan_bare_identifier_do {
24343 # this routine is called to scan a token starting with an alphanumeric
24344 # variable or package separator, :: or '.
24345 # USES GLOBAL VARIABLES: $current_package, $last_nonblank_token,
24346 # $last_nonblank_type,@paren_type, $paren_depth
24348 my ( $input_line, $i, $tok, $type, $prototype, $rtoken_map,
24352 my $package = undef;
24356 # we have to back up one pretoken at a :: since each : is one pretoken
24357 if ( $tok eq '::' ) { $i_beg-- }
24358 if ( $tok eq '->' ) { $i_beg-- }
24359 my $pos_beg = $$rtoken_map[$i_beg];
24360 pos($input_line) = $pos_beg;
24367 if ( $input_line =~ m/\G\s*((?:\w*(?:'|::)))*(?:(?:->)?(\w+))?/gc ) {
24369 my $pos = pos($input_line);
24370 my $numc = $pos - $pos_beg;
24371 $tok = substr( $input_line, $pos_beg, $numc );
24373 # type 'w' includes anything without leading type info
24374 # ($,%,@,*) including something like abc::def::ghi
24378 if ( defined($2) ) { $sub_name = $2; }
24379 if ( defined($1) ) {
24382 # patch: don't allow isolated package name which just ends
24383 # in the old style package separator (single quote). Example:
24385 if ( !($sub_name) && substr( $package, -1, 1 ) eq '\'' ) {
24389 $package =~ s/\'/::/g;
24390 if ( $package =~ /^\:/ ) { $package = 'main' . $package }
24391 $package =~ s/::$//;
24394 $package = $current_package;
24396 if ( $is_keyword{$tok} ) {
24401 # if it is a bareword..
24402 if ( $type eq 'w' ) {
24404 # check for v-string with leading 'v' type character
24405 # (This seems to have presidence over filehandle, type 'Y')
24406 if ( $tok =~ /^v\d[_\d]*$/ ) {
24408 # we only have the first part - something like 'v101' -
24410 if ( $input_line =~ m/\G(\.\d[_\d]*)+/gc ) {
24411 $pos = pos($input_line);
24412 $numc = $pos - $pos_beg;
24413 $tok = substr( $input_line, $pos_beg, $numc );
24417 # warn if this version can't handle v-strings
24418 report_v_string($tok);
24421 elsif ( $is_constant{$package}{$sub_name} ) {
24425 # bareword after sort has implied empty prototype; for example:
24426 # @sorted = sort numerically ( 53, 29, 11, 32, 7 );
24427 # This has priority over whatever the user has specified.
24428 elsif ($last_nonblank_token eq 'sort'
24429 && $last_nonblank_type eq 'k' )
24434 # Note: strangely, perl does not seem to really let you create
24435 # functions which act like eval and do, in the sense that eval
24436 # and do may have operators following the final }, but any operators
24437 # that you create with prototype (&) apparently do not allow
24438 # trailing operators, only terms. This seems strange.
24439 # If this ever changes, here is the update
24440 # to make perltidy behave accordingly:
24442 # elsif ( $is_block_function{$package}{$tok} ) {
24443 # $tok='eval'; # patch to do braces like eval - doesn't work
24446 # FIXME: This could become a separate type to allow for different
24448 elsif ( $is_block_function{$package}{$sub_name} ) {
24452 elsif ( $is_block_list_function{$package}{$sub_name} ) {
24455 elsif ( $is_user_function{$package}{$sub_name} ) {
24457 $prototype = $user_function_prototype{$package}{$sub_name};
24460 # check for indirect object
24463 # added 2001-03-27: must not be followed immediately by '('
24465 ( $input_line !~ m/\G\(/gc )
24470 # preceded by keyword like 'print', 'printf' and friends
24471 $is_indirect_object_taker{$last_nonblank_token}
24473 # or preceded by something like 'print(' or 'printf('
24475 ( $last_nonblank_token eq '(' )
24476 && $is_indirect_object_taker{ $paren_type[$paren_depth]
24484 # may not be indirect object unless followed by a space
24485 if ( $input_line =~ m/\G\s+/gc ) {
24489 # Perl's indirect object notation is a very bad
24490 # thing and can cause subtle bugs, especially for
24491 # beginning programmers. And I haven't even been
24492 # able to figure out a sane warning scheme which
24493 # doesn't get in the way of good scripts.
24495 # Complain if a filehandle has any lower case
24496 # letters. This is suggested good practice, but the
24497 # main reason for this warning is that prior to
24498 # release 20010328, perltidy incorrectly parsed a
24499 # function call after a print/printf, with the
24500 # result that a space got added before the opening
24501 # paren, thereby converting the function name to a
24502 # filehandle according to perl's weird rules. This
24503 # will not usually generate a syntax error, so this
24504 # is a potentially serious bug. By warning
24505 # of filehandles with any lower case letters,
24506 # followed by opening parens, we will help the user
24507 # find almost all of these older errors.
24508 # use 'sub_name' because something like
24509 # main::MYHANDLE is ok for filehandle
24510 if ( $sub_name =~ /[a-z]/ ) {
24512 # could be bug caused by older perltidy if
24514 if ( $input_line =~ m/\G\s*\(/gc ) {
24516 "Caution: unknown word '$tok' in indirect object slot\n"
24522 # bareword not followed by a space -- may not be filehandle
24523 # (may be function call defined in a 'use' statement)
24530 # Now we must convert back from character position
24531 # to pre_token index.
24532 # I don't think an error flag can occur here ..but who knows
24535 inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
24537 warning("scan_bare_identifier: Possibly invalid tokenization\n");
24541 # no match but line not blank - could be syntax error
24542 # perl will take '::' alone without complaint
24546 # change this warning to log message if it becomes annoying
24547 warning("didn't find identifier after leading ::\n");
24549 return ( $i, $tok, $type, $prototype );
24554 # This is the new scanner and will eventually replace scan_identifier.
24555 # Only type 'sub' and 'package' are implemented.
24556 # Token types $ * % @ & -> are not yet implemented.
24558 # Scan identifier following a type token.
24559 # The type of call depends on $id_scan_state: $id_scan_state = ''
24560 # for starting call, in which case $tok must be the token defining
24563 # If the type token is the last nonblank token on the line, a value
24564 # of $id_scan_state = $tok is returned, indicating that further
24565 # calls must be made to get the identifier. If the type token is
24566 # not the last nonblank token on the line, the identifier is
24567 # scanned and handled and a value of '' is returned.
24568 # USES GLOBAL VARIABLES: $current_package, $last_nonblank_token, $in_attribute_list,
24569 # $statement_type, $tokenizer_self
24571 my ( $input_line, $i, $tok, $rtokens, $rtoken_map, $id_scan_state,
24575 my ( $i_beg, $pos_beg );
24577 #print "NSCAN:entering i=$i, tok=$tok, type=$type, state=$id_scan_state\n";
24578 #my ($a,$b,$c) = caller;
24579 #print "NSCAN: scan_id called with tok=$tok $a $b $c\n";
24581 # on re-entry, start scanning at first token on the line
24582 if ($id_scan_state) {
24587 # on initial entry, start scanning just after type token
24590 $id_scan_state = $tok;
24594 # find $i_beg = index of next nonblank token,
24595 # and handle empty lines
24596 my $blank_line = 0;
24597 my $next_nonblank_token = $$rtokens[$i_beg];
24598 if ( $i_beg > $max_token_index ) {
24603 # only a '#' immediately after a '$' is not a comment
24604 if ( $next_nonblank_token eq '#' ) {
24605 unless ( $tok eq '$' ) {
24610 if ( $next_nonblank_token =~ /^\s/ ) {
24611 ( $next_nonblank_token, $i_beg ) =
24612 find_next_nonblank_token_on_this_line( $i_beg, $rtokens,
24613 $max_token_index );
24614 if ( $next_nonblank_token =~ /(^#|^\s*$)/ ) {
24620 # handle non-blank line; identifier, if any, must follow
24621 unless ($blank_line) {
24623 if ( $id_scan_state eq 'sub' ) {
24624 ( $i, $tok, $type, $id_scan_state ) = do_scan_sub(
24625 $input_line, $i, $i_beg,
24626 $tok, $type, $rtokens,
24627 $rtoken_map, $id_scan_state, $max_token_index
24631 elsif ( $id_scan_state eq 'package' ) {
24632 ( $i, $tok, $type ) =
24633 do_scan_package( $input_line, $i, $i_beg, $tok, $type, $rtokens,
24634 $rtoken_map, $max_token_index );
24635 $id_scan_state = '';
24639 warning("invalid token in scan_id: $tok\n");
24640 $id_scan_state = '';
24644 if ( $id_scan_state && ( !defined($type) || !$type ) ) {
24646 # shouldn't happen:
24648 "Program bug in scan_id: undefined type but scan_state=$id_scan_state\n"
24650 report_definite_bug();
24653 TOKENIZER_DEBUG_FLAG_NSCAN && do {
24655 "NSCAN: returns i=$i, tok=$tok, type=$type, state=$id_scan_state\n";
24657 return ( $i, $tok, $type, $id_scan_state );
24660 sub check_prototype {
24661 my ( $proto, $package, $subname ) = @_;
24662 return unless ( defined($package) && defined($subname) );
24663 if ( defined($proto) ) {
24664 $proto =~ s/^\s*\(\s*//;
24665 $proto =~ s/\s*\)$//;
24667 $is_user_function{$package}{$subname} = 1;
24668 $user_function_prototype{$package}{$subname} = "($proto)";
24670 # prototypes containing '&' must be treated specially..
24671 if ( $proto =~ /\&/ ) {
24673 # right curly braces of prototypes ending in
24674 # '&' may be followed by an operator
24675 if ( $proto =~ /\&$/ ) {
24676 $is_block_function{$package}{$subname} = 1;
24679 # right curly braces of prototypes NOT ending in
24680 # '&' may NOT be followed by an operator
24681 elsif ( $proto !~ /\&$/ ) {
24682 $is_block_list_function{$package}{$subname} = 1;
24687 $is_constant{$package}{$subname} = 1;
24691 $is_user_function{$package}{$subname} = 1;
24695 sub do_scan_package {
24697 # do_scan_package parses a package name
24698 # it is called with $i_beg equal to the index of the first nonblank
24699 # token following a 'package' token.
24700 # USES GLOBAL VARIABLES: $current_package,
24702 my ( $input_line, $i, $i_beg, $tok, $type, $rtokens, $rtoken_map,
24705 my $package = undef;
24706 my $pos_beg = $$rtoken_map[$i_beg];
24707 pos($input_line) = $pos_beg;
24709 # handle non-blank line; package name, if any, must follow
24710 if ( $input_line =~ m/\G\s*((?:\w*(?:'|::))*\w+)/gc ) {
24712 $package = ( defined($1) && $1 ) ? $1 : 'main';
24713 $package =~ s/\'/::/g;
24714 if ( $package =~ /^\:/ ) { $package = 'main' . $package }
24715 $package =~ s/::$//;
24716 my $pos = pos($input_line);
24717 my $numc = $pos - $pos_beg;
24718 $tok = 'package ' . substr( $input_line, $pos_beg, $numc );
24721 # Now we must convert back from character position
24722 # to pre_token index.
24723 # I don't think an error flag can occur here ..but ?
24726 inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
24727 if ($error) { warning("Possibly invalid package\n") }
24728 $current_package = $package;
24731 my ( $next_nonblank_token, $i_next ) =
24732 find_next_nonblank_token( $i, $rtokens, $max_token_index );
24733 if ( $next_nonblank_token !~ /^[;\}]$/ ) {
24735 "Unexpected '$next_nonblank_token' after package name '$tok'\n"
24740 # no match but line not blank --
24741 # could be a label with name package, like package: , for example.
24746 return ( $i, $tok, $type );
24749 sub scan_identifier_do {
24751 # This routine assembles tokens into identifiers. It maintains a
24752 # scan state, id_scan_state. It updates id_scan_state based upon
24753 # current id_scan_state and token, and returns an updated
24754 # id_scan_state and the next index after the identifier.
24755 # USES GLOBAL VARIABLES: $context, $last_nonblank_token,
24756 # $last_nonblank_type
24758 my ( $i, $id_scan_state, $identifier, $rtokens, $max_token_index ) = @_;
24761 my $tok_begin = $$rtokens[$i_begin];
24762 if ( $tok_begin eq ':' ) { $tok_begin = '::' }
24763 my $id_scan_state_begin = $id_scan_state;
24764 my $identifier_begin = $identifier;
24765 my $tok = $tok_begin;
24768 # these flags will be used to help figure out the type:
24769 my $saw_alpha = ( $tok =~ /^[A-Za-z_]/ );
24772 # allow old package separator (') except in 'use' statement
24773 my $allow_tick = ( $last_nonblank_token ne 'use' );
24775 # get started by defining a type and a state if necessary
24776 unless ($id_scan_state) {
24777 $context = UNKNOWN_CONTEXT;
24779 # fixup for digraph
24780 if ( $tok eq '>' ) {
24784 $identifier = $tok;
24786 if ( $tok eq '$' || $tok eq '*' ) {
24787 $id_scan_state = '$';
24788 $context = SCALAR_CONTEXT;
24790 elsif ( $tok eq '%' || $tok eq '@' ) {
24791 $id_scan_state = '$';
24792 $context = LIST_CONTEXT;
24794 elsif ( $tok eq '&' ) {
24795 $id_scan_state = '&';
24797 elsif ( $tok eq 'sub' or $tok eq 'package' ) {
24798 $saw_alpha = 0; # 'sub' is considered type info here
24799 $id_scan_state = '$';
24800 $identifier .= ' '; # need a space to separate sub from sub name
24802 elsif ( $tok eq '::' ) {
24803 $id_scan_state = 'A';
24805 elsif ( $tok =~ /^[A-Za-z_]/ ) {
24806 $id_scan_state = ':';
24808 elsif ( $tok eq '->' ) {
24809 $id_scan_state = '$';
24814 my ( $a, $b, $c ) = caller;
24815 warning("Program Bug: scan_identifier given bad token = $tok \n");
24816 warning(" called from sub $a line: $c\n");
24817 report_definite_bug();
24819 $saw_type = !$saw_alpha;
24823 $saw_type = ( $tok =~ /([\$\%\@\*\&])/ );
24826 # now loop to gather the identifier
24829 while ( $i < $max_token_index ) {
24830 $i_save = $i unless ( $tok =~ /^\s*$/ );
24831 $tok = $$rtokens[ ++$i ];
24833 if ( ( $tok eq ':' ) && ( $$rtokens[ $i + 1 ] eq ':' ) ) {
24838 if ( $id_scan_state eq '$' ) { # starting variable name
24840 if ( $tok eq '$' ) {
24842 $identifier .= $tok;
24844 # we've got a punctuation variable if end of line (punct.t)
24845 if ( $i == $max_token_index ) {
24847 $id_scan_state = '';
24851 elsif ( $tok =~ /^[A-Za-z_]/ ) { # alphanumeric ..
24853 $id_scan_state = ':'; # now need ::
24854 $identifier .= $tok;
24856 elsif ( $tok eq "'" && $allow_tick ) { # alphanumeric ..
24858 $id_scan_state = ':'; # now need ::
24859 $identifier .= $tok;
24861 # Perl will accept leading digits in identifiers,
24862 # although they may not always produce useful results.
24863 # Something like $main::0 is ok. But this also works:
24865 # sub howdy::123::bubba{ print "bubba $54321!\n" }
24866 # howdy::123::bubba();
24869 elsif ( $tok =~ /^[0-9]/ ) { # numeric
24871 $id_scan_state = ':'; # now need ::
24872 $identifier .= $tok;
24874 elsif ( $tok eq '::' ) {
24875 $id_scan_state = 'A';
24876 $identifier .= $tok;
24878 elsif ( ( $tok eq '#' ) && ( $identifier eq '$' ) ) { # $#array
24879 $identifier .= $tok; # keep same state, a $ could follow
24881 elsif ( $tok eq '{' ) {
24883 # check for something like ${#} or ${©}
24884 if ( $identifier eq '$'
24885 && $i + 2 <= $max_token_index
24886 && $$rtokens[ $i + 2 ] eq '}'
24887 && $$rtokens[ $i + 1 ] !~ /[\s\w]/ )
24889 my $next2 = $$rtokens[ $i + 2 ];
24890 my $next1 = $$rtokens[ $i + 1 ];
24891 $identifier .= $tok . $next1 . $next2;
24893 $id_scan_state = '';
24897 # skip something like ${xxx} or ->{
24898 $id_scan_state = '';
24900 # if this is the first token of a line, any tokens for this
24901 # identifier have already been accumulated
24902 if ( $identifier eq '$' || $i == 0 ) { $identifier = ''; }
24907 # space ok after leading $ % * & @
24908 elsif ( $tok =~ /^\s*$/ ) {
24910 if ( $identifier =~ /^[\$\%\*\&\@]/ ) {
24912 if ( length($identifier) > 1 ) {
24913 $id_scan_state = '';
24915 $type = 'i'; # probably punctuation variable
24920 # spaces after $'s are common, and space after @
24921 # is harmless, so only complain about space
24922 # after other type characters. Space after $ and
24923 # @ will be removed in formatting. Report space
24924 # after % and * because they might indicate a
24925 # parsing error. In other words '% ' might be a
24926 # modulo operator. Delete this warning if it
24928 if ( $identifier !~ /^[\@\$]$/ ) {
24930 "Space in identifier, following $identifier\n";
24936 # space after '->' is ok
24938 elsif ( $tok eq '^' ) {
24940 # check for some special variables like $^W
24941 if ( $identifier =~ /^[\$\*\@\%]$/ ) {
24942 $identifier .= $tok;
24943 $id_scan_state = 'A';
24945 # Perl accepts '$^]' or '@^]', but
24946 # there must not be a space before the ']'.
24947 my $next1 = $$rtokens[ $i + 1 ];
24948 if ( $next1 eq ']' ) {
24950 $identifier .= $next1;
24951 $id_scan_state = "";
24956 $id_scan_state = '';
24959 else { # something else
24961 # check for various punctuation variables
24962 if ( $identifier =~ /^[\$\*\@\%]$/ ) {
24963 $identifier .= $tok;
24966 elsif ( $identifier eq '$#' ) {
24968 if ( $tok eq '{' ) { $type = 'i'; $i = $i_save }
24970 # perl seems to allow just these: $#: $#- $#+
24971 elsif ( $tok =~ /^[\:\-\+]$/ ) {
24973 $identifier .= $tok;
24977 write_logfile_entry( 'Use of $# is deprecated' . "\n" );
24980 elsif ( $identifier eq '$$' ) {
24982 # perl does not allow references to punctuation
24983 # variables without braces. For example, this
24987 # You would have to use
24991 if ( $tok eq '{' ) { $type = 't' }
24992 else { $type = 'i' }
24994 elsif ( $identifier eq '->' ) {
24999 if ( length($identifier) == 1 ) { $identifier = ''; }
25001 $id_scan_state = '';
25005 elsif ( $id_scan_state eq '&' ) { # starting sub call?
25007 if ( $tok =~ /^[\$A-Za-z_]/ ) { # alphanumeric ..
25008 $id_scan_state = ':'; # now need ::
25010 $identifier .= $tok;
25012 elsif ( $tok eq "'" && $allow_tick ) { # alphanumeric ..
25013 $id_scan_state = ':'; # now need ::
25015 $identifier .= $tok;
25017 elsif ( $tok =~ /^[0-9]/ ) { # numeric..see comments above
25018 $id_scan_state = ':'; # now need ::
25020 $identifier .= $tok;
25022 elsif ( $tok =~ /^\s*$/ ) { # allow space
25024 elsif ( $tok eq '::' ) { # leading ::
25025 $id_scan_state = 'A'; # accept alpha next
25026 $identifier .= $tok;
25028 elsif ( $tok eq '{' ) {
25029 if ( $identifier eq '&' || $i == 0 ) { $identifier = ''; }
25031 $id_scan_state = '';
25036 # punctuation variable?
25037 # testfile: cunningham4.pl
25038 if ( $identifier eq '&' ) {
25039 $identifier .= $tok;
25046 $id_scan_state = '';
25050 elsif ( $id_scan_state eq 'A' ) { # looking for alpha (after ::)
25052 if ( $tok =~ /^[A-Za-z_]/ ) { # found it
25053 $identifier .= $tok;
25054 $id_scan_state = ':'; # now need ::
25057 elsif ( $tok eq "'" && $allow_tick ) {
25058 $identifier .= $tok;
25059 $id_scan_state = ':'; # now need ::
25062 elsif ( $tok =~ /^[0-9]/ ) { # numeric..see comments above
25063 $identifier .= $tok;
25064 $id_scan_state = ':'; # now need ::
25067 elsif ( ( $identifier =~ /^sub / ) && ( $tok =~ /^\s*$/ ) ) {
25068 $id_scan_state = '(';
25069 $identifier .= $tok;
25071 elsif ( ( $identifier =~ /^sub / ) && ( $tok eq '(' ) ) {
25072 $id_scan_state = ')';
25073 $identifier .= $tok;
25076 $id_scan_state = '';
25081 elsif ( $id_scan_state eq ':' ) { # looking for :: after alpha
25083 if ( $tok eq '::' ) { # got it
25084 $identifier .= $tok;
25085 $id_scan_state = 'A'; # now require alpha
25087 elsif ( $tok =~ /^[A-Za-z_]/ ) { # more alphanumeric is ok here
25088 $identifier .= $tok;
25089 $id_scan_state = ':'; # now need ::
25092 elsif ( $tok =~ /^[0-9]/ ) { # numeric..see comments above
25093 $identifier .= $tok;
25094 $id_scan_state = ':'; # now need ::
25097 elsif ( $tok eq "'" && $allow_tick ) { # tick
25099 if ( $is_keyword{$identifier} ) {
25100 $id_scan_state = ''; # that's all
25104 $identifier .= $tok;
25107 elsif ( ( $identifier =~ /^sub / ) && ( $tok =~ /^\s*$/ ) ) {
25108 $id_scan_state = '(';
25109 $identifier .= $tok;
25111 elsif ( ( $identifier =~ /^sub / ) && ( $tok eq '(' ) ) {
25112 $id_scan_state = ')';
25113 $identifier .= $tok;
25116 $id_scan_state = ''; # that's all
25121 elsif ( $id_scan_state eq '(' ) { # looking for ( of prototype
25123 if ( $tok eq '(' ) { # got it
25124 $identifier .= $tok;
25125 $id_scan_state = ')'; # now find the end of it
25127 elsif ( $tok =~ /^\s*$/ ) { # blank - keep going
25128 $identifier .= $tok;
25131 $id_scan_state = ''; # that's all - no prototype
25136 elsif ( $id_scan_state eq ')' ) { # looking for ) to end
25138 if ( $tok eq ')' ) { # got it
25139 $identifier .= $tok;
25140 $id_scan_state = ''; # all done
25143 elsif ( $tok =~ /^[\s\$\%\\\*\@\&\;]/ ) {
25144 $identifier .= $tok;
25146 else { # probable error in script, but keep going
25147 warning("Unexpected '$tok' while seeking end of prototype\n");
25148 $identifier .= $tok;
25151 else { # can get here due to error in initialization
25152 $id_scan_state = '';
25158 if ( $id_scan_state eq ')' ) {
25159 warning("Hit end of line while seeking ) to end prototype\n");
25162 # once we enter the actual identifier, it may not extend beyond
25163 # the end of the current line
25164 if ( $id_scan_state =~ /^[A\:\(\)]/ ) {
25165 $id_scan_state = '';
25167 if ( $i < 0 ) { $i = 0 }
25174 if ( $identifier =~ /^->/ && $last_nonblank_type eq 'w' ) {
25177 else { $type = 'i' }
25179 elsif ( $identifier eq '->' ) {
25183 ( length($identifier) > 1 )
25185 # In something like '@$=' we have an identifier '@$'
25186 # In something like '$${' we have type '$$' (and only
25187 # part of an identifier)
25188 && !( $identifier =~ /\$$/ && $tok eq '{' )
25189 && ( $identifier !~ /^(sub |package )$/ )
25194 else { $type = 't' }
25196 elsif ($saw_alpha) {
25198 # type 'w' includes anything without leading type info
25199 # ($,%,@,*) including something like abc::def::ghi
25204 } # this can happen on a restart
25208 $tok = $identifier;
25209 if ($message) { write_logfile_entry($message) }
25216 TOKENIZER_DEBUG_FLAG_SCAN_ID && do {
25217 my ( $a, $b, $c ) = caller;
25219 "SCANID: called from $a $b $c with tok, i, state, identifier =$tok_begin, $i_begin, $id_scan_state_begin, $identifier_begin\n";
25221 "SCANID: returned with tok, i, state, identifier =$tok, $i, $id_scan_state, $identifier\n";
25223 return ( $i, $tok, $type, $id_scan_state, $identifier );
25228 # saved package and subnames in case prototype is on separate line
25229 my ( $package_saved, $subname_saved );
25233 # do_scan_sub parses a sub name and prototype
25234 # it is called with $i_beg equal to the index of the first nonblank
25235 # token following a 'sub' token.
25237 # TODO: add future error checks to be sure we have a valid
25238 # sub name. For example, 'sub &doit' is wrong. Also, be sure
25239 # a name is given if and only if a non-anonymous sub is
25241 # USES GLOBAL VARS: $current_package, $last_nonblank_token,
25242 # $in_attribute_list, %saw_function_definition,
25246 $input_line, $i, $i_beg,
25247 $tok, $type, $rtokens,
25248 $rtoken_map, $id_scan_state, $max_token_index
25250 $id_scan_state = ""; # normally we get everything in one call
25251 my $subname = undef;
25252 my $package = undef;
25257 my $pos_beg = $$rtoken_map[$i_beg];
25258 pos($input_line) = $pos_beg;
25260 # sub NAME PROTO ATTRS
25262 $input_line =~ m/\G\s*
25263 ((?:\w*(?:'|::))*) # package - something that ends in :: or '
25264 (\w+) # NAME - required
25265 (\s*\([^){]*\))? # PROTO - something in parens
25266 (\s*:)? # ATTRS - leading : of attribute list
25275 $package = ( defined($1) && $1 ) ? $1 : $current_package;
25276 $package =~ s/\'/::/g;
25277 if ( $package =~ /^\:/ ) { $package = 'main' . $package }
25278 $package =~ s/::$//;
25279 my $pos = pos($input_line);
25280 my $numc = $pos - $pos_beg;
25281 $tok = 'sub ' . substr( $input_line, $pos_beg, $numc );
25285 # Look for prototype/attributes not preceded on this line by subname;
25286 # This might be an anonymous sub with attributes,
25287 # or a prototype on a separate line from its sub name
25289 $input_line =~ m/\G(\s*\([^){]*\))? # PROTO
25290 (\s*:)? # ATTRS leading ':'
25299 # Handle prototype on separate line from subname
25300 if ($subname_saved) {
25301 $package = $package_saved;
25302 $subname = $subname_saved;
25303 $tok = $last_nonblank_token;
25310 # ATTRS: if there are attributes, back up and let the ':' be
25311 # found later by the scanner.
25312 my $pos = pos($input_line);
25314 $pos -= length($attrs);
25317 my $next_nonblank_token = $tok;
25319 # catch case of line with leading ATTR ':' after anonymous sub
25320 if ( $pos == $pos_beg && $tok eq ':' ) {
25322 $in_attribute_list = 1;
25325 # We must convert back from character position
25326 # to pre_token index.
25329 # I don't think an error flag can occur here ..but ?
25331 ( $i, $error ) = inverse_pretoken_map( $i, $pos, $rtoken_map,
25332 $max_token_index );
25333 if ($error) { warning("Possibly invalid sub\n") }
25335 # check for multiple definitions of a sub
25336 ( $next_nonblank_token, my $i_next ) =
25337 find_next_nonblank_token_on_this_line( $i, $rtokens,
25338 $max_token_index );
25341 if ( $next_nonblank_token =~ /^(\s*|#)$/ )
25342 { # skip blank or side comment
25343 my ( $rpre_tokens, $rpre_types ) =
25344 peek_ahead_for_n_nonblank_pre_tokens(1);
25345 if ( defined($rpre_tokens) && @$rpre_tokens ) {
25346 $next_nonblank_token = $rpre_tokens->[0];
25349 $next_nonblank_token = '}';
25352 $package_saved = "";
25353 $subname_saved = "";
25354 if ( $next_nonblank_token eq '{' ) {
25357 # Check for multiple definitions of a sub, but
25358 # it is ok to have multiple sub BEGIN, etc,
25359 # so we do not complain if name is all caps
25360 if ( $saw_function_definition{$package}{$subname}
25361 && $subname !~ /^[A-Z]+$/ )
25363 my $lno = $saw_function_definition{$package}{$subname};
25365 "already saw definition of 'sub $subname' in package '$package' at line $lno\n"
25368 $saw_function_definition{$package}{$subname} =
25369 $tokenizer_self->{_last_line_number};
25372 elsif ( $next_nonblank_token eq ';' ) {
25374 elsif ( $next_nonblank_token eq '}' ) {
25377 # ATTRS - if an attribute list follows, remember the name
25378 # of the sub so the next opening brace can be labeled.
25379 # Setting 'statement_type' causes any ':'s to introduce
25381 elsif ( $next_nonblank_token eq ':' ) {
25382 $statement_type = $tok;
25385 # see if PROTO follows on another line:
25386 elsif ( $next_nonblank_token eq '(' ) {
25387 if ( $attrs || $proto ) {
25389 "unexpected '(' after definition or declaration of sub '$subname'\n"
25393 $id_scan_state = 'sub'; # we must come back to get proto
25394 $statement_type = $tok;
25395 $package_saved = $package;
25396 $subname_saved = $subname;
25399 elsif ($next_nonblank_token) { # EOF technically ok
25401 "expecting ':' or ';' or '{' after definition or declaration of sub '$subname' but saw '$next_nonblank_token'\n"
25404 check_prototype( $proto, $package, $subname );
25407 # no match but line not blank
25410 return ( $i, $tok, $type, $id_scan_state );
25414 #########i###############################################################
25415 # Tokenizer utility routines which may use CONSTANTS but no other GLOBALS
25416 #########################################################################
25418 sub find_next_nonblank_token {
25419 my ( $i, $rtokens, $max_token_index ) = @_;
25421 if ( $i >= $max_token_index ) {
25422 if ( !peeked_ahead() ) {
25425 peek_ahead_for_nonblank_token( $rtokens, $max_token_index );
25428 my $next_nonblank_token = $$rtokens[ ++$i ];
25430 if ( $next_nonblank_token =~ /^\s*$/ ) {
25431 $next_nonblank_token = $$rtokens[ ++$i ];
25433 return ( $next_nonblank_token, $i );
25436 sub numerator_expected {
25438 # this is a filter for a possible numerator, in support of guessing
25439 # for the / pattern delimiter token.
25444 # Note: I am using the convention that variables ending in
25445 # _expected have these 3 possible values.
25446 my ( $i, $rtokens, $max_token_index ) = @_;
25447 my $next_token = $$rtokens[ $i + 1 ];
25448 if ( $next_token eq '=' ) { $i++; } # handle /=
25449 my ( $next_nonblank_token, $i_next ) =
25450 find_next_nonblank_token( $i, $rtokens, $max_token_index );
25452 if ( $next_nonblank_token =~ /(\(|\$|\w|\.|\@)/ ) {
25457 if ( $next_nonblank_token =~ /^\s*$/ ) {
25466 sub pattern_expected {
25468 # This is the start of a filter for a possible pattern.
25469 # It looks at the token after a possbible pattern and tries to
25470 # determine if that token could end a pattern.
25475 my ( $i, $rtokens, $max_token_index ) = @_;
25476 my $next_token = $$rtokens[ $i + 1 ];
25477 if ( $next_token =~ /^[cgimosx]/ ) { $i++; } # skip possible modifier
25478 my ( $next_nonblank_token, $i_next ) =
25479 find_next_nonblank_token( $i, $rtokens, $max_token_index );
25481 # list of tokens which may follow a pattern
25482 # (can probably be expanded)
25483 if ( $next_nonblank_token =~ /(\)|\}|\;|\&\&|\|\||and|or|while|if|unless)/ )
25489 if ( $next_nonblank_token =~ /^\s*$/ ) {
25498 sub find_next_nonblank_token_on_this_line {
25499 my ( $i, $rtokens, $max_token_index ) = @_;
25500 my $next_nonblank_token;
25502 if ( $i < $max_token_index ) {
25503 $next_nonblank_token = $$rtokens[ ++$i ];
25505 if ( $next_nonblank_token =~ /^\s*$/ ) {
25507 if ( $i < $max_token_index ) {
25508 $next_nonblank_token = $$rtokens[ ++$i ];
25513 $next_nonblank_token = "";
25515 return ( $next_nonblank_token, $i );
25518 sub find_angle_operator_termination {
25520 # We are looking at a '<' and want to know if it is an angle operator.
25521 # We are to return:
25522 # $i = pretoken index of ending '>' if found, current $i otherwise
25523 # $type = 'Q' if found, '>' otherwise
25524 my ( $input_line, $i_beg, $rtoken_map, $expecting, $max_token_index ) = @_;
25527 pos($input_line) = 1 + $$rtoken_map[$i];
25531 # we just have to find the next '>' if a term is expected
25532 if ( $expecting == TERM ) { $filter = '[\>]' }
25534 # we have to guess if we don't know what is expected
25535 elsif ( $expecting == UNKNOWN ) { $filter = '[\>\;\=\#\|\<]' }
25537 # shouldn't happen - we shouldn't be here if operator is expected
25538 else { warning("Program Bug in find_angle_operator_termination\n") }
25540 # To illustrate what we might be looking at, in case we are
25541 # guessing, here are some examples of valid angle operators
25548 # <jskdfjskdfj* op/* jskdjfjkosvk*> ( glob.t)
25549 # <${PREFIX}*img*.$IMAGE_TYPE>
25550 # <img*.$IMAGE_TYPE>
25551 # <Timg*.$IMAGE_TYPE>
25552 # <$LATEX2HTMLVERSIONS${dd}html[1-9].[0-9].pl>
25554 # Here are some examples of lines which do not have angle operators:
25555 # return undef unless $self->[2]++ < $#{$self->[1]};
25558 # the following line from dlister.pl caused trouble:
25559 # print'~'x79,"\n",$D<1024?"0.$D":$D>>10,"K, $C files\n\n\n";
25561 # If the '<' starts an angle operator, it must end on this line and
25562 # it must not have certain characters like ';' and '=' in it. I use
25563 # this to limit the testing. This filter should be improved if
25566 if ( $input_line =~ /($filter)/g ) {
25570 # We MAY have found an angle operator termination if we get
25571 # here, but we need to do more to be sure we haven't been
25573 my $pos = pos($input_line);
25575 my $pos_beg = $$rtoken_map[$i];
25576 my $str = substr( $input_line, $pos_beg, ( $pos - $pos_beg ) );
25578 # Reject if the closing '>' follows a '-' as in:
25579 # if ( VERSION < 5.009 && $op-> name eq 'aassign' ) { }
25580 if ( $expecting eq UNKNOWN ) {
25581 my $check = substr( $input_line, $pos - 2, 1 );
25582 if ( $check eq '-' ) {
25583 return ( $i, $type );
25587 ######################################debug#####
25588 #write_diagnostics( "ANGLE? :$str\n");
25589 #print "ANGLE: found $1 at pos=$pos str=$str check=$check\n";
25590 ######################################debug#####
25594 inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
25596 # It may be possible that a quote ends midway in a pretoken.
25597 # If this happens, it may be necessary to split the pretoken.
25600 "Possible tokinization error..please check this line\n");
25601 report_possible_bug();
25604 # Now let's see where we stand....
25605 # OK if math op not possible
25606 if ( $expecting == TERM ) {
25609 # OK if there are no more than 2 pre-tokens inside
25610 # (not possible to write 2 token math between < and >)
25611 # This catches most common cases
25612 elsif ( $i <= $i_beg + 3 ) {
25613 write_diagnostics("ANGLE(1 or 2 tokens): $str\n");
25619 # Let's try a Brace Test: any braces inside must balance
25621 while ( $str =~ /\{/g ) { $br++ }
25622 while ( $str =~ /\}/g ) { $br-- }
25624 while ( $str =~ /\[/g ) { $sb++ }
25625 while ( $str =~ /\]/g ) { $sb-- }
25627 while ( $str =~ /\(/g ) { $pr++ }
25628 while ( $str =~ /\)/g ) { $pr-- }
25630 # if braces do not balance - not angle operator
25631 if ( $br || $sb || $pr ) {
25635 "NOT ANGLE (BRACE={$br ($pr [$sb ):$str\n");
25638 # we should keep doing more checks here...to be continued
25639 # Tentatively accepting this as a valid angle operator.
25640 # There are lots more things that can be checked.
25643 "ANGLE-Guessing yes: $str expecting=$expecting\n");
25644 write_logfile_entry("Guessing angle operator here: $str\n");
25649 # didn't find ending >
25651 if ( $expecting == TERM ) {
25652 warning("No ending > for angle operator\n");
25656 return ( $i, $type );
25659 sub scan_number_do {
25661 # scan a number in any of the formats that Perl accepts
25662 # Underbars (_) are allowed in decimal numbers.
25663 # input parameters -
25664 # $input_line - the string to scan
25665 # $i - pre_token index to start scanning
25666 # $rtoken_map - reference to the pre_token map giving starting
25667 # character position in $input_line of token $i
25668 # output parameters -
25669 # $i - last pre_token index of the number just scanned
25670 # number - the number (characters); or undef if not a number
25672 my ( $input_line, $i, $rtoken_map, $input_type, $max_token_index ) = @_;
25673 my $pos_beg = $$rtoken_map[$i];
25676 my $number = undef;
25677 my $type = $input_type;
25679 my $first_char = substr( $input_line, $pos_beg, 1 );
25681 # Look for bad starting characters; Shouldn't happen..
25682 if ( $first_char !~ /[\d\.\+\-Ee]/ ) {
25683 warning("Program bug - scan_number given character $first_char\n");
25684 report_definite_bug();
25685 return ( $i, $type, $number );
25688 # handle v-string without leading 'v' character ('Two Dot' rule)
25690 # TODO: v-strings may contain underscores
25691 pos($input_line) = $pos_beg;
25692 if ( $input_line =~ /\G((\d+)?\.\d+(\.\d+)+)/g ) {
25693 $pos = pos($input_line);
25694 my $numc = $pos - $pos_beg;
25695 $number = substr( $input_line, $pos_beg, $numc );
25697 report_v_string($number);
25700 # handle octal, hex, binary
25701 if ( !defined($number) ) {
25702 pos($input_line) = $pos_beg;
25703 if ( $input_line =~ /\G[+-]?0((x[0-9a-fA-F_]+)|([0-7_]+)|(b[01_]+))/g )
25705 $pos = pos($input_line);
25706 my $numc = $pos - $pos_beg;
25707 $number = substr( $input_line, $pos_beg, $numc );
25713 if ( !defined($number) ) {
25714 pos($input_line) = $pos_beg;
25716 if ( $input_line =~ /\G([+-]?[\d_]*(\.[\d_]*)?([Ee][+-]?(\d+))?)/g ) {
25717 $pos = pos($input_line);
25719 # watch out for things like 0..40 which would give 0. by this;
25720 if ( ( substr( $input_line, $pos - 1, 1 ) eq '.' )
25721 && ( substr( $input_line, $pos, 1 ) eq '.' ) )
25725 my $numc = $pos - $pos_beg;
25726 $number = substr( $input_line, $pos_beg, $numc );
25731 # filter out non-numbers like e + - . e2 .e3 +e6
25732 # the rule: at least one digit, and any 'e' must be preceded by a digit
25734 $number !~ /\d/ # no digits
25735 || ( $number =~ /^(.*)[eE]/
25736 && $1 !~ /\d/ ) # or no digits before the 'e'
25740 $type = $input_type;
25741 return ( $i, $type, $number );
25744 # Found a number; now we must convert back from character position
25745 # to pre_token index. An error here implies user syntax error.
25746 # An example would be an invalid octal number like '009'.
25749 inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
25750 if ($error) { warning("Possibly invalid number\n") }
25752 return ( $i, $type, $number );
25755 sub inverse_pretoken_map {
25757 # Starting with the current pre_token index $i, scan forward until
25758 # finding the index of the next pre_token whose position is $pos.
25759 my ( $i, $pos, $rtoken_map, $max_token_index ) = @_;
25762 while ( ++$i <= $max_token_index ) {
25764 if ( $pos <= $$rtoken_map[$i] ) {
25766 # Let the calling routine handle errors in which we do not
25767 # land on a pre-token boundary. It can happen by running
25768 # perltidy on some non-perl scripts, for example.
25769 if ( $pos < $$rtoken_map[$i] ) { $error = 1 }
25774 return ( $i, $error );
25777 sub find_here_doc {
25779 # find the target of a here document, if any
25780 # input parameters:
25781 # $i - token index of the second < of <<
25782 # ($i must be less than the last token index if this is called)
25783 # output parameters:
25784 # $found_target = 0 didn't find target; =1 found target
25785 # HERE_TARGET - the target string (may be empty string)
25786 # $i - unchanged if not here doc,
25787 # or index of the last token of the here target
25788 # $saw_error - flag noting unbalanced quote on here target
25789 my ( $expecting, $i, $rtokens, $rtoken_map, $max_token_index ) = @_;
25791 my $found_target = 0;
25792 my $here_doc_target = '';
25793 my $here_quote_character = '';
25795 my ( $next_nonblank_token, $i_next_nonblank, $next_token );
25796 $next_token = $$rtokens[ $i + 1 ];
25798 # perl allows a backslash before the target string (heredoc.t)
25800 if ( $next_token eq '\\' ) {
25802 $next_token = $$rtokens[ $i + 2 ];
25805 ( $next_nonblank_token, $i_next_nonblank ) =
25806 find_next_nonblank_token_on_this_line( $i, $rtokens, $max_token_index );
25808 if ( $next_nonblank_token =~ /[\'\"\`]/ ) {
25811 my $quote_depth = 0;
25816 $i, $in_quote, $here_quote_character, $quote_pos, $quote_depth,
25819 = follow_quoted_string( $i_next_nonblank, $in_quote, $rtokens,
25820 $here_quote_character, $quote_pos, $quote_depth, $max_token_index );
25822 if ($in_quote) { # didn't find end of quote, so no target found
25824 if ( $expecting == TERM ) {
25826 "Did not find here-doc string terminator ($here_quote_character) before end of line \n"
25831 else { # found ending quote
25836 for ( $j = $i_next_nonblank + 1 ; $j < $i ; $j++ ) {
25837 $tokj = $$rtokens[$j];
25839 # we have to remove any backslash before the quote character
25840 # so that the here-doc-target exactly matches this string
25844 && $$rtokens[ $j + 1 ] eq $here_quote_character );
25845 $here_doc_target .= $tokj;
25850 elsif ( ( $next_token =~ /^\s*$/ ) and ( $expecting == TERM ) ) {
25852 write_logfile_entry(
25853 "found blank here-target after <<; suggest using \"\"\n");
25856 elsif ( $next_token =~ /^\w/ ) { # simple bareword or integer after <<
25858 my $here_doc_expected;
25859 if ( $expecting == UNKNOWN ) {
25860 $here_doc_expected = guess_if_here_doc($next_token);
25863 $here_doc_expected = 1;
25866 if ($here_doc_expected) {
25868 $here_doc_target = $next_token;
25875 if ( $expecting == TERM ) {
25877 write_logfile_entry("Note: bare here-doc operator <<\n");
25884 # patch to neglect any prepended backslash
25885 if ( $found_target && $backslash ) { $i++ }
25887 return ( $found_target, $here_doc_target, $here_quote_character, $i,
25893 # follow (or continue following) quoted string(s)
25894 # $in_quote return code:
25895 # 0 - ok, found end
25896 # 1 - still must find end of quote whose target is $quote_character
25897 # 2 - still looking for end of first of two quotes
25899 # Returns updated strings:
25900 # $quoted_string_1 = quoted string seen while in_quote=1
25901 # $quoted_string_2 = quoted string seen while in_quote=2
25903 $i, $in_quote, $quote_character,
25904 $quote_pos, $quote_depth, $quoted_string_1,
25905 $quoted_string_2, $rtokens, $rtoken_map,
25909 my $in_quote_starting = $in_quote;
25912 if ( $in_quote == 2 ) { # two quotes/quoted_string_1s to follow
25915 $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
25918 = follow_quoted_string( $i, $in_quote, $rtokens, $quote_character,
25919 $quote_pos, $quote_depth, $max_token_index );
25920 $quoted_string_2 .= $quoted_string;
25921 if ( $in_quote == 1 ) {
25922 if ( $quote_character =~ /[\{\[\<\(]/ ) { $i++; }
25923 $quote_character = '';
25926 $quoted_string_2 .= "\n";
25930 if ( $in_quote == 1 ) { # one (more) quote to follow
25933 $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
25936 = follow_quoted_string( $ibeg, $in_quote, $rtokens, $quote_character,
25937 $quote_pos, $quote_depth, $max_token_index );
25938 $quoted_string_1 .= $quoted_string;
25939 if ( $in_quote == 1 ) {
25940 $quoted_string_1 .= "\n";
25943 return ( $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
25944 $quoted_string_1, $quoted_string_2 );
25947 sub follow_quoted_string {
25949 # scan for a specific token, skipping escaped characters
25950 # if the quote character is blank, use the first non-blank character
25951 # input parameters:
25952 # $rtokens = reference to the array of tokens
25953 # $i = the token index of the first character to search
25954 # $in_quote = number of quoted strings being followed
25955 # $beginning_tok = the starting quote character
25956 # $quote_pos = index to check next for alphanumeric delimiter
25957 # output parameters:
25958 # $i = the token index of the ending quote character
25959 # $in_quote = decremented if found end, unchanged if not
25960 # $beginning_tok = the starting quote character
25961 # $quote_pos = index to check next for alphanumeric delimiter
25962 # $quote_depth = nesting depth, since delimiters '{ ( [ <' can be nested.
25963 # $quoted_string = the text of the quote (without quotation tokens)
25964 my ( $i_beg, $in_quote, $rtokens, $beginning_tok, $quote_pos, $quote_depth,
25967 my ( $tok, $end_tok );
25968 my $i = $i_beg - 1;
25969 my $quoted_string = "";
25971 TOKENIZER_DEBUG_FLAG_QUOTE && do {
25973 "QUOTE entering with quote_pos = $quote_pos i=$i beginning_tok =$beginning_tok\n";
25976 # get the corresponding end token
25977 if ( $beginning_tok !~ /^\s*$/ ) {
25978 $end_tok = matching_end_token($beginning_tok);
25981 # a blank token means we must find and use the first non-blank one
25983 my $allow_quote_comments = ( $i < 0 ) ? 1 : 0; # i<0 means we saw a <cr>
25985 while ( $i < $max_token_index ) {
25986 $tok = $$rtokens[ ++$i ];
25988 if ( $tok !~ /^\s*$/ ) {
25990 if ( ( $tok eq '#' ) && ($allow_quote_comments) ) {
25991 $i = $max_token_index;
25995 if ( length($tok) > 1 ) {
25996 if ( $quote_pos <= 0 ) { $quote_pos = 1 }
25997 $beginning_tok = substr( $tok, $quote_pos - 1, 1 );
26000 $beginning_tok = $tok;
26003 $end_tok = matching_end_token($beginning_tok);
26009 $allow_quote_comments = 1;
26014 # There are two different loops which search for the ending quote
26015 # character. In the rare case of an alphanumeric quote delimiter, we
26016 # have to look through alphanumeric tokens character-by-character, since
26017 # the pre-tokenization process combines multiple alphanumeric
26018 # characters, whereas for a non-alphanumeric delimiter, only tokens of
26019 # length 1 can match.
26021 ###################################################################
26022 # Case 1 (rare): loop for case of alphanumeric quote delimiter..
26023 # "quote_pos" is the position the current word to begin searching
26024 ###################################################################
26025 if ( $beginning_tok =~ /\w/ ) {
26027 # Note this because it is not recommended practice except
26028 # for obfuscated perl contests
26029 if ( $in_quote == 1 ) {
26030 write_logfile_entry(
26031 "Note: alphanumeric quote delimiter ($beginning_tok) \n");
26034 while ( $i < $max_token_index ) {
26036 if ( $quote_pos == 0 || ( $i < 0 ) ) {
26037 $tok = $$rtokens[ ++$i ];
26039 if ( $tok eq '\\' ) {
26041 # retain backslash unless it hides the end token
26042 $quoted_string .= $tok
26043 unless $$rtokens[ $i + 1 ] eq $end_tok;
26045 last if ( $i >= $max_token_index );
26046 $tok = $$rtokens[ ++$i ];
26049 my $old_pos = $quote_pos;
26051 unless ( defined($tok) && defined($end_tok) && defined($quote_pos) )
26055 $quote_pos = 1 + index( $tok, $end_tok, $quote_pos );
26057 if ( $quote_pos > 0 ) {
26060 substr( $tok, $old_pos, $quote_pos - $old_pos - 1 );
26064 if ( $quote_depth == 0 ) {
26070 $quoted_string .= substr( $tok, $old_pos );
26075 ########################################################################
26076 # Case 2 (normal): loop for case of a non-alphanumeric quote delimiter..
26077 ########################################################################
26080 while ( $i < $max_token_index ) {
26081 $tok = $$rtokens[ ++$i ];
26083 if ( $tok eq $end_tok ) {
26086 if ( $quote_depth == 0 ) {
26091 elsif ( $tok eq $beginning_tok ) {
26094 elsif ( $tok eq '\\' ) {
26096 # retain backslash unless it hides the beginning or end token
26097 $tok = $$rtokens[ ++$i ];
26098 $quoted_string .= '\\'
26099 unless ( $tok eq $end_tok || $tok eq $beginning_tok );
26101 $quoted_string .= $tok;
26104 if ( $i > $max_token_index ) { $i = $max_token_index }
26105 return ( $i, $in_quote, $beginning_tok, $quote_pos, $quote_depth,
26109 sub indicate_error {
26110 my ( $msg, $line_number, $input_line, $pos, $carrat ) = @_;
26111 interrupt_logfile();
26113 write_error_indicator_pair( $line_number, $input_line, $pos, $carrat );
26117 sub write_error_indicator_pair {
26118 my ( $line_number, $input_line, $pos, $carrat ) = @_;
26119 my ( $offset, $numbered_line, $underline ) =
26120 make_numbered_line( $line_number, $input_line, $pos );
26121 $underline = write_on_underline( $underline, $pos - $offset, $carrat );
26122 warning( $numbered_line . "\n" );
26123 $underline =~ s/\s*$//;
26124 warning( $underline . "\n" );
26127 sub make_numbered_line {
26129 # Given an input line, its line number, and a character position of
26130 # interest, create a string not longer than 80 characters of the form
26131 # $lineno: sub_string
26132 # such that the sub_string of $str contains the position of interest
26134 # Here is an example of what we want, in this case we add trailing
26135 # '...' because the line is long.
26137 # 2: (One of QAML 2.0's authors is a member of the World Wide Web Con ...
26139 # Here is another example, this time in which we used leading '...'
26140 # because of excessive length:
26142 # 2: ... er of the World Wide Web Consortium's
26144 # input parameters are:
26145 # $lineno = line number
26146 # $str = the text of the line
26147 # $pos = position of interest (the error) : 0 = first character
26150 # - $offset = an offset which corrects the position in case we only
26151 # display part of a line, such that $pos-$offset is the effective
26152 # position from the start of the displayed line.
26153 # - $numbered_line = the numbered line as above,
26154 # - $underline = a blank 'underline' which is all spaces with the same
26155 # number of characters as the numbered line.
26157 my ( $lineno, $str, $pos ) = @_;
26158 my $offset = ( $pos < 60 ) ? 0 : $pos - 40;
26159 my $excess = length($str) - $offset - 68;
26160 my $numc = ( $excess > 0 ) ? 68 : undef;
26162 if ( defined($numc) ) {
26163 if ( $offset == 0 ) {
26164 $str = substr( $str, $offset, $numc - 4 ) . " ...";
26167 $str = "... " . substr( $str, $offset + 4, $numc - 4 ) . " ...";
26172 if ( $offset == 0 ) {
26175 $str = "... " . substr( $str, $offset + 4 );
26179 my $numbered_line = sprintf( "%d: ", $lineno );
26180 $offset -= length($numbered_line);
26181 $numbered_line .= $str;
26182 my $underline = " " x length($numbered_line);
26183 return ( $offset, $numbered_line, $underline );
26186 sub write_on_underline {
26188 # The "underline" is a string that shows where an error is; it starts
26189 # out as a string of blanks with the same length as the numbered line of
26190 # code above it, and we have to add marking to show where an error is.
26191 # In the example below, we want to write the string '--^' just below
26192 # the line of bad code:
26194 # 2: (One of QAML 2.0's authors is a member of the World Wide Web Con ...
26196 # We are given the current underline string, plus a position and a
26197 # string to write on it.
26199 # In the above example, there will be 2 calls to do this:
26200 # First call: $pos=19, pos_chr=^
26201 # Second call: $pos=16, pos_chr=---
26203 # This is a trivial thing to do with substr, but there is some
26206 my ( $underline, $pos, $pos_chr ) = @_;
26208 # check for error..shouldn't happen
26209 unless ( ( $pos >= 0 ) && ( $pos <= length($underline) ) ) {
26212 my $excess = length($pos_chr) + $pos - length($underline);
26213 if ( $excess > 0 ) {
26214 $pos_chr = substr( $pos_chr, 0, length($pos_chr) - $excess );
26216 substr( $underline, $pos, length($pos_chr) ) = $pos_chr;
26217 return ($underline);
26222 # Break a string, $str, into a sequence of preliminary tokens. We
26223 # are interested in these types of tokens:
26224 # words (type='w'), example: 'max_tokens_wanted'
26225 # digits (type = 'd'), example: '0755'
26226 # whitespace (type = 'b'), example: ' '
26227 # any other single character (i.e. punct; type = the character itself).
26228 # We cannot do better than this yet because we might be in a quoted
26229 # string or pattern. Caller sets $max_tokens_wanted to 0 to get all
26231 my ( $str, $max_tokens_wanted ) = @_;
26233 # we return references to these 3 arrays:
26234 my @tokens = (); # array of the tokens themselves
26235 my @token_map = (0); # string position of start of each token
26236 my @type = (); # 'b'=whitespace, 'd'=digits, 'w'=alpha, or punct
26241 if ( $str =~ /\G(\s+)/gc ) { push @type, 'b'; }
26244 # note that this must come before words!
26245 elsif ( $str =~ /\G(\d+)/gc ) { push @type, 'd'; }
26248 elsif ( $str =~ /\G(\w+)/gc ) { push @type, 'w'; }
26250 # single-character punctuation
26251 elsif ( $str =~ /\G(\W)/gc ) { push @type, $1; }
26255 return ( \@tokens, \@token_map, \@type );
26259 push @token_map, pos($str);
26261 } while ( --$max_tokens_wanted != 0 );
26263 return ( \@tokens, \@token_map, \@type );
26268 # this is an old debug routine
26269 my ( $rtokens, $rtoken_map ) = @_;
26270 my $num = scalar(@$rtokens);
26273 for ( $i = 0 ; $i < $num ; $i++ ) {
26274 my $len = length( $$rtokens[$i] );
26275 print "$i:$len:$$rtoken_map[$i]:$$rtokens[$i]:\n";
26279 sub matching_end_token {
26281 # find closing character for a pattern
26282 my $beginning_token = shift;
26284 if ( $beginning_token eq '{' ) {
26287 elsif ( $beginning_token eq '[' ) {
26290 elsif ( $beginning_token eq '<' ) {
26293 elsif ( $beginning_token eq '(' ) {
26301 sub dump_token_types {
26305 # This should be the latest list of token types in use
26306 # adding NEW_TOKENS: add a comment here
26307 print $fh <<'END_OF_LIST';
26309 Here is a list of the token types currently used for lines of type 'CODE'.
26310 For the following tokens, the "type" of a token is just the token itself.
26312 .. :: << >> ** && .. || // -> => += -= .= %= &= |= ^= *= <>
26313 ( ) <= >= == =~ !~ != ++ -- /= x=
26314 ... **= <<= >>= &&= ||= //= <=>
26315 , + - / * | % ! x ~ = \ ? : . < > ^ &
26317 The following additional token types are defined:
26320 b blank (white space)
26321 { indent: opening structural curly brace or square bracket or paren
26322 (code block, anonymous hash reference, or anonymous array reference)
26323 } outdent: right structural curly brace or square bracket or paren
26324 [ left non-structural square bracket (enclosing an array index)
26325 ] right non-structural square bracket
26326 ( left non-structural paren (all but a list right of an =)
26327 ) right non-structural parena
26328 L left non-structural curly brace (enclosing a key)
26329 R right non-structural curly brace
26330 ; terminal semicolon
26331 f indicates a semicolon in a "for" statement
26332 h here_doc operator <<
26334 Q indicates a quote or pattern
26335 q indicates a qw quote block
26337 C user-defined constant or constant function (with void prototype = ())
26338 U user-defined function taking parameters
26339 G user-defined function taking block parameter (like grep/map/eval)
26340 M (unused, but reserved for subroutine definition name)
26341 P (unused, but -html uses it to label pod text)
26342 t type indicater such as %,$,@,*,&,sub
26343 w bare word (perhaps a subroutine call)
26344 i identifier of some type (with leading %, $, @, *, &, sub, -> )
26347 F a file test operator (like -e)
26349 Z identifier in indirect object slot: may be file handle, object
26350 J LABEL: code block label
26351 j LABEL after next, last, redo, goto
26354 pp pre-increment operator ++
26355 mm pre-decrement operator --
26356 A : used as attribute separator
26358 Here are the '_line_type' codes used internally:
26359 SYSTEM - system-specific code before hash-bang line
26360 CODE - line of perl code (including comments)
26361 POD_START - line starting pod, such as '=head'
26362 POD - pod documentation text
26363 POD_END - last line of pod section, '=cut'
26364 HERE - text of here-document
26365 HERE_END - last line of here-doc (target word)
26366 FORMAT - format section
26367 FORMAT_END - last line of format section, '.'
26368 DATA_START - __DATA__ line
26369 DATA - unidentified text following __DATA__
26370 END_START - __END__ line
26371 END - unidentified text following __END__
26372 ERROR - we are in big trouble, probably not a perl script
26378 # These names are used in error messages
26379 @opening_brace_names = qw# '{' '[' '(' '?' #;
26380 @closing_brace_names = qw# '}' ']' ')' ':' #;
26383 .. :: << >> ** && .. || // -> => += -= .= %= &= |= ^= *= <>
26384 <= >= == =~ !~ != ++ -- /= x= ~~
26386 @is_digraph{@digraphs} = (1) x scalar(@digraphs);
26388 my @trigraphs = qw( ... **= <<= >>= &&= ||= //= <=> !~~ );
26389 @is_trigraph{@trigraphs} = (1) x scalar(@trigraphs);
26391 # make a hash of all valid token types for self-checking the tokenizer
26392 # (adding NEW_TOKENS : select a new character and add to this list)
26393 my @valid_token_types = qw#
26394 A b C G L R f h Q k t w i q n p m F pp mm U j J Y Z v
26395 { } ( ) [ ] ; + - / * | % ! x ~ = \ ? : . < > ^ &
26397 push( @valid_token_types, @digraphs );
26398 push( @valid_token_types, @trigraphs );
26399 push( @valid_token_types, '#' );
26400 push( @valid_token_types, ',' );
26401 @is_valid_token_type{@valid_token_types} = (1) x scalar(@valid_token_types);
26403 # a list of file test letters, as in -e (Table 3-4 of 'camel 3')
26404 my @file_test_operators =
26405 qw( A B C M O R S T W X b c d e f g k l o p r s t u w x z);
26406 @is_file_test_operator{@file_test_operators} =
26407 (1) x scalar(@file_test_operators);
26409 # these functions have prototypes of the form (&), so when they are
26410 # followed by a block, that block MAY BE followed by an operator.
26411 @_ = qw( do eval );
26412 @is_block_operator{@_} = (1) x scalar(@_);
26414 # these functions allow an identifier in the indirect object slot
26415 @_ = qw( print printf sort exec system say);
26416 @is_indirect_object_taker{@_} = (1) x scalar(@_);
26418 # These tokens may precede a code block
26419 # patched for SWITCH/CASE
26420 @_ = qw( BEGIN END CHECK INIT AUTOLOAD DESTROY continue if elsif else
26421 unless do while until eval for foreach map grep sort
26422 switch case given when);
26423 @is_code_block_token{@_} = (1) x scalar(@_);
26425 # I'll build the list of keywords incrementally
26428 # keywords and tokens after which a value or pattern is expected,
26429 # but not an operator. In other words, these should consume terms
26430 # to their right, or at least they are not expected to be followed
26431 # immediately by operators.
26432 my @value_requestor = qw(
26651 # patched above for SWITCH/CASE given/when err say
26652 # 'err' is a fairly safe addition.
26653 # TODO: 'default' still needed if appropriate
26654 # 'use feature' seen, but perltidy works ok without it.
26655 # Concerned that 'default' could break code.
26656 push( @Keywords, @value_requestor );
26658 # These are treated the same but are not keywords:
26663 push( @value_requestor, @extra_vr );
26665 @expecting_term_token{@value_requestor} = (1) x scalar(@value_requestor);
26667 # this list contains keywords which do not look for arguments,
26668 # so that they might be followed by an operator, or at least
26670 my @operator_requestor = qw(
26694 push( @Keywords, @operator_requestor );
26696 # These are treated the same but are not considered keywords:
26703 push( @operator_requestor, @extra_or );
26705 @expecting_operator_token{@operator_requestor} =
26706 (1) x scalar(@operator_requestor);
26708 # these token TYPES expect trailing operator but not a term
26709 # note: ++ and -- are post-increment and decrement, 'C' = constant
26710 my @operator_requestor_types = qw( ++ -- C <> q );
26711 @expecting_operator_types{@operator_requestor_types} =
26712 (1) x scalar(@operator_requestor_types);
26714 # these token TYPES consume values (terms)
26715 # note: pp and mm are pre-increment and decrement
26716 # f=semicolon in for, F=file test operator
26717 my @value_requestor_type = qw#
26718 L { ( [ ~ !~ =~ ; . .. ... A : && ! || // = + - x
26719 **= += -= .= /= *= %= x= &= |= ^= <<= >>= &&= ||= //=
26720 <= >= == != => \ > < % * / ? & | ** <=> ~~ !~~
26721 f F pp mm Y p m U J G j >> << ^ t
26723 push( @value_requestor_type, ',' )
26724 ; # (perl doesn't like a ',' in a qw block)
26725 @expecting_term_types{@value_requestor_type} =
26726 (1) x scalar(@value_requestor_type);
26728 # Note: the following valid token types are not assigned here to
26729 # hashes requesting to be followed by values or terms, but are
26730 # instead currently hard-coded into sub operator_expected:
26731 # ) -> :: Q R Z ] b h i k n v w } #
26733 # For simple syntax checking, it is nice to have a list of operators which
26734 # will really be unhappy if not followed by a term. This includes most
26736 %really_want_term = %expecting_term_types;
26738 # with these exceptions...
26739 delete $really_want_term{'U'}; # user sub, depends on prototype
26740 delete $really_want_term{'F'}; # file test works on $_ if no following term
26741 delete $really_want_term{'Y'}; # indirect object, too risky to check syntax;
26744 @_ = qw(q qq qw qx qr s y tr m);
26745 @is_q_qq_qw_qx_qr_s_y_tr_m{@_} = (1) x scalar(@_);
26747 # These keywords are handled specially in the tokenizer code:
26748 my @special_keywords = qw(
26764 push( @Keywords, @special_keywords );
26766 # Keywords after which list formatting may be used
26767 # WARNING: do not include |map|grep|eval or perl may die on
26768 # syntax errors (map1.t).
26769 my @keyword_taking_list = qw(
26841 @is_keyword_taking_list{@keyword_taking_list} =
26842 (1) x scalar(@keyword_taking_list);
26844 # These are not used in any way yet
26845 # my @unused_keywords = qw(
26852 # The list of keywords was extracted from function 'keyword' in
26853 # perl file toke.c version 5.005.03, using this utility, plus a
26854 # little editing: (file getkwd.pl):
26855 # while (<>) { while (/\"(.*)\"/g) { print "$1\n"; } }
26856 # Add 'get' prefix where necessary, then split into the above lists.
26857 # This list should be updated as necessary.
26858 # The list should not contain these special variables:
26859 # ARGV DATA ENV SIG STDERR STDIN STDOUT
26862 @is_keyword{@Keywords} = (1) x scalar(@Keywords);
26869 Perl::Tidy - Parses and beautifies perl source
26875 Perl::Tidy::perltidy(
26877 destination => $destination,
26880 perltidyrc => $perltidyrc,
26881 logfile => $logfile,
26882 errorfile => $errorfile,
26883 formatter => $formatter, # callback object (see below)
26884 dump_options => $dump_options,
26885 dump_options_type => $dump_options_type,
26890 This module makes the functionality of the perltidy utility available to perl
26891 scripts. Any or all of the input parameters may be omitted, in which case the
26892 @ARGV array will be used to provide input parameters as described
26893 in the perltidy(1) man page.
26895 For example, the perltidy script is basically just this:
26898 Perl::Tidy::perltidy();
26900 The module accepts input and output streams by a variety of methods.
26901 The following list of parameters may be any of a the following: a
26902 filename, an ARRAY reference, a SCALAR reference, or an object with
26903 either a B<getline> or B<print> method, as appropriate.
26905 source - the source of the script to be formatted
26906 destination - the destination of the formatted output
26907 stderr - standard error output
26908 perltidyrc - the .perltidyrc file
26909 logfile - the .LOG file stream, if any
26910 errorfile - the .ERR file stream, if any
26911 dump_options - ref to a hash to receive parameters (see below),
26912 dump_options_type - controls contents of dump_options
26913 dump_getopt_flags - ref to a hash to receive Getopt flags
26914 dump_options_category - ref to a hash giving category of options
26915 dump_abbreviations - ref to a hash giving all abbreviations
26917 The following chart illustrates the logic used to decide how to
26920 ref($param) $param is assumed to be:
26921 ----------- ---------------------
26923 SCALAR ref to string
26925 (other) object with getline (if source) or print method
26927 If the parameter is an object, and the object has a B<close> method, that
26928 close method will be called at the end of the stream.
26934 If the B<source> parameter is given, it defines the source of the
26939 If the B<destination> parameter is given, it will be used to define the
26940 file or memory location to receive output of perltidy.
26944 The B<stderr> parameter allows the calling program to capture the output
26945 to what would otherwise go to the standard error output device.
26949 If the B<perltidyrc> file is given, it will be used instead of any
26950 F<.perltidyrc> configuration file that would otherwise be used.
26954 If the B<argv> parameter is given, it will be used instead of the
26955 B<@ARGV> array. The B<argv> parameter may be a string, a reference to a
26956 string, or a reference to an array. If it is a string or reference to a
26957 string, it will be parsed into an array of items just as if it were a
26958 command line string.
26962 If the B<dump_options> parameter is given, it must be the reference to a hash.
26963 In this case, the parameters contained in any perltidyrc configuration file
26964 will be placed in this hash and perltidy will return immediately. This is
26965 equivalent to running perltidy with --dump-options, except that the perameters
26966 are returned in a hash rather than dumped to standard output. Also, by default
26967 only the parameters in the perltidyrc file are returned, but this can be
26968 changed (see the next parameter). This parameter provides a convenient method
26969 for external programs to read a perltidyrc file. An example program using
26970 this feature, F<perltidyrc_dump.pl>, is included in the distribution.
26972 Any combination of the B<dump_> parameters may be used together.
26974 =item dump_options_type
26976 This parameter is a string which can be used to control the parameters placed
26977 in the hash reference supplied by B<dump_options>. The possible values are
26978 'perltidyrc' (default) and 'full'. The 'full' parameter causes both the
26979 default options plus any options found in a perltidyrc file to be returned.
26981 =item dump_getopt_flags
26983 If the B<dump_getopt_flags> parameter is given, it must be the reference to a
26984 hash. This hash will receive all of the parameters that perltidy understands
26985 and flags that are passed to Getopt::Long. This parameter may be
26986 used alone or with the B<dump_options> flag. Perltidy will
26987 exit immediately after filling this hash. See the demo program
26988 F<perltidyrc_dump.pl> for example usage.
26990 =item dump_options_category
26992 If the B<dump_options_category> parameter is given, it must be the reference to a
26993 hash. This hash will receive a hash with keys equal to all long parameter names
26994 and values equal to the title of the corresponding section of the perltidy manual.
26995 See the demo program F<perltidyrc_dump.pl> for example usage.
26997 =item dump_abbreviations
26999 If the B<dump_abbreviations> parameter is given, it must be the reference to a
27000 hash. This hash will receive all abbreviations used by Perl::Tidy. See the
27001 demo program F<perltidyrc_dump.pl> for example usage.
27007 The following example passes perltidy a snippet as a reference
27008 to a string and receives the result back in a reference to
27013 # some messy source code to format
27014 my $source = <<'EOM';
27016 my @editors=('Emacs', 'Vi '); my $rand = rand();
27017 print "A poll of 10 random programmers gave these results:\n";
27019 my $i=int ($rand+rand());
27020 print " $editors[$i] users are from Venus" . ", " .
27021 "$editors[1-$i] users are from Mars" .
27026 # We'll pass it as ref to SCALAR and receive it in a ref to ARRAY
27028 perltidy( source => \$source, destination => \@dest );
27029 foreach (@dest) {print}
27031 =head1 Using the B<formatter> Callback Object
27033 The B<formatter> parameter is an optional callback object which allows
27034 the calling program to receive tokenized lines directly from perltidy for
27035 further specialized processing. When this parameter is used, the two
27036 formatting options which are built into perltidy (beautification or
27037 html) are ignored. The following diagram illustrates the logical flow:
27039 |-- (normal route) -> code beautification
27040 caller->perltidy->|-- (-html flag ) -> create html
27041 |-- (formatter given)-> callback to write_line
27043 This can be useful for processing perl scripts in some way. The
27044 parameter C<$formatter> in the perltidy call,
27046 formatter => $formatter,
27048 is an object created by the caller with a C<write_line> method which
27049 will accept and process tokenized lines, one line per call. Here is
27050 a simple example of a C<write_line> which merely prints the line number,
27051 the line type (as determined by perltidy), and the text of the line:
27055 # This is called from perltidy line-by-line
27057 my $line_of_tokens = shift;
27058 my $line_type = $line_of_tokens->{_line_type};
27059 my $input_line_number = $line_of_tokens->{_line_number};
27060 my $input_line = $line_of_tokens->{_line_text};
27061 print "$input_line_number:$line_type:$input_line";
27064 The complete program, B<perllinetype>, is contained in the examples section of
27065 the source distribution. As this example shows, the callback method
27066 receives a parameter B<$line_of_tokens>, which is a reference to a hash
27067 of other useful information. This example uses these hash entries:
27069 $line_of_tokens->{_line_number} - the line number (1,2,...)
27070 $line_of_tokens->{_line_text} - the text of the line
27071 $line_of_tokens->{_line_type} - the type of the line, one of:
27073 SYSTEM - system-specific code before hash-bang line
27074 CODE - line of perl code (including comments)
27075 POD_START - line starting pod, such as '=head'
27076 POD - pod documentation text
27077 POD_END - last line of pod section, '=cut'
27078 HERE - text of here-document
27079 HERE_END - last line of here-doc (target word)
27080 FORMAT - format section
27081 FORMAT_END - last line of format section, '.'
27082 DATA_START - __DATA__ line
27083 DATA - unidentified text following __DATA__
27084 END_START - __END__ line
27085 END - unidentified text following __END__
27086 ERROR - we are in big trouble, probably not a perl script
27088 Most applications will be only interested in lines of type B<CODE>. For
27089 another example, let's write a program which checks for one of the
27090 so-called I<naughty matching variables> C<&`>, C<$&>, and C<$'>, which
27091 can slow down processing. Here is a B<write_line>, from the example
27092 program B<find_naughty.pl>, which does that:
27096 # This is called back from perltidy line-by-line
27097 # We're looking for $`, $&, and $'
27098 my ( $self, $line_of_tokens ) = @_;
27100 # pull out some stuff we might need
27101 my $line_type = $line_of_tokens->{_line_type};
27102 my $input_line_number = $line_of_tokens->{_line_number};
27103 my $input_line = $line_of_tokens->{_line_text};
27104 my $rtoken_type = $line_of_tokens->{_rtoken_type};
27105 my $rtokens = $line_of_tokens->{_rtokens};
27108 # skip comments, pod, etc
27109 return if ( $line_type ne 'CODE' );
27111 # loop over tokens looking for $`, $&, and $'
27112 for ( my $j = 0 ; $j < @$rtoken_type ; $j++ ) {
27114 # we only want to examine token types 'i' (identifier)
27115 next unless $$rtoken_type[$j] eq 'i';
27117 # pull out the actual token text
27118 my $token = $$rtokens[$j];
27121 if ( $token =~ /^\$[\`\&\']$/ ) {
27123 "$input_line_number: $token\n";
27128 This example pulls out these tokenization variables from the $line_of_tokens
27131 $rtoken_type = $line_of_tokens->{_rtoken_type};
27132 $rtokens = $line_of_tokens->{_rtokens};
27134 The variable C<$rtoken_type> is a reference to an array of token type codes,
27135 and C<$rtokens> is a reference to a corresponding array of token text.
27136 These are obviously only defined for lines of type B<CODE>.
27137 Perltidy classifies tokens into types, and has a brief code for each type.
27138 You can get a complete list at any time by running perltidy from the
27141 perltidy --dump-token-types
27143 In the present example, we are only looking for tokens of type B<i>
27144 (identifiers), so the for loop skips past all other types. When an
27145 identifier is found, its actual text is checked to see if it is one
27146 being sought. If so, the above write_line prints the token and its
27149 The B<formatter> feature is relatively new in perltidy, and further
27150 documentation needs to be written to complete its description. However,
27151 several example programs have been written and can be found in the
27152 B<examples> section of the source distribution. Probably the best way
27153 to get started is to find one of the examples which most closely matches
27154 your application and start modifying it.
27156 For help with perltidy's pecular way of breaking lines into tokens, you
27157 might run, from the command line,
27159 perltidy -D filename
27161 where F<filename> is a short script of interest. This will produce
27162 F<filename.DEBUG> with interleaved lines of text and their token types.
27163 The B<-D> flag has been in perltidy from the beginning for this purpose.
27164 If you want to see the code which creates this file, it is
27165 C<write_debug_entry> in Tidy.pm.
27173 Thanks to Hugh Myers who developed the initial modular interface
27178 This man page documents Perl::Tidy version 20070424.
27183 perltidy at users.sourceforge.net
27187 The perltidy(1) man page describes all of the features of perltidy. It
27188 can be found at http://perltidy.sourceforge.net.