6 Statistics::R - Perl interface with the R statistical program
10 I<Statistics::R> is a module to controls the R interpreter (R project for
11 statistical computing: L<http://www.r-project.org/>). It lets you start R, pass
12 commands to it and retrieve their output. A shared mode allows several instances
13 of I<Statistics::R> to talk to the same R process.
15 The current I<Statistics::R> implementation uses pipes (stdin, stdout and stderr)
16 to communicate with R. This implementation is more efficient and reliable than
17 that in versions < 0.20, which relied on reading and writing intermediary files.
18 As before, this module works on GNU/Linux, MS Windows and probably many more
19 systems. I<Statistics::R> has been tested with R version 2 and 3.
25 # Create a communication bridge with R and start R
26 my $R = Statistics::R->new();
28 # Run simple R commands
29 my $output_file = "file.ps";
30 $R->run(qq`postscript("$output_file", horizontal=FALSE, width=500, height=500)`);
31 $R->run(q`plot(c(1, 5, 10), type = "l")`);
32 $R->run(q`dev.off()`);
34 # Pass and retrieve data (scalars or arrays)
36 $R->set('x', $input_value);
38 my $output_value = $R->get('y');
39 print "y = $output_value\n";
49 Build a I<Statistics::R> bridge object connecting Perl and R. Available options
56 Specify the full path to the R executable, if it is not automatically found. See
61 Start a shared bridge. When using a shared bridge, several instances of
62 Statistics::R can communicate with the same unique R instance. Example:
66 my $R1 = Statistics::R->new( shared => 1);
67 my $R2 = Statistics::R->new( shared => 1);
69 $R1->set( 'x', 'pear' );
70 my $x = $R2->get( 'x' );
73 $R1->stop; # or $R2->stop
75 Note that in shared mode, you are responsible for calling the I<stop()> method
76 from one of your Statistics::R instances when you are finished. But be careful
77 not to call the I<stop()> method if you still have processes that need to
85 First, I<start()> R if it is not yet running. Then, execute R commands passed
86 as a string and return the output as a string. If your commands failed to run
87 in R, an error message will be displayed.
91 my $out = $R->run( q`print( 1 + 2 )` );
93 If you intend on runnning many R commands, it may be convenient to pass a list
94 of commands or put multiple commands in an here-doc:
104 # Here-doc with multiple R commands:
111 my $out2 = $R->run($cmds);
113 Alternatively, to run commands from a file, use the I<run_from_file()> method.
115 The return value you get from I<run()> is a combination of what R would display
116 on the standard output and the standard error, but the exact order may differ.
118 When loading modules, some may write numerous messages on standard error. You
119 can disable this behavior using the following R command:
121 suppressPackageStartupMessages(library(library_to_load))
123 Note that older versions of R impose a limit on how many characters can be
124 contained on a line: about 4076 bytes maximum. You will be warned if this
125 occurs, with an error message stating:
127 '\0' is an unrecognized escape in character string starting "...
129 In this case, try to break down your R code into several smaller, more
130 manageable statements. Alternatively, adding newline characters "\n" at
131 strategic places in the R statements will work around the issue.
133 =item run_from_file()
135 Similar to I<run()> but reads the R commands from the specified file.
136 Internally, this method converts the filename to a format compatible with R and
137 then passes it to the R I<source()> command to read the file and execute the
142 Get the results from the last R command.
146 Set the value of an R variable (scalar or vector). Example:
149 $R->set( 'x', 'pear' );
154 $R->set( 'y', [1, 2, 3] );
158 Get the value of an R variable (scalar or vector). Example:
160 # Retrieve an R scalar. $x is a Perl scalar.
161 my $x = $R->get( 'x' );
165 # Retrieve an R list. $x is a Perl arrayref.
166 my $y = $R->get( 'y' );
170 Explicitly start R. Most times, you do not need to do that because the first
171 execution of I<run()> or I<set()> will automatically call I<start()>.
175 Stop a running instance of R. You need to call this method after running a
176 shared bridge. For a simple bridge, you do not need to do this because
177 I<stop()> is automatically called when the Statistics::R object goes out of
182 I<stop()> and I<start()> R.
186 Get or set the path to the R executable. Note that the path will be available
187 only after start() has been called.
191 Get the version number of R.
195 Was R started in shared mode?
203 Return the PID of the running R process
209 Since I<Statistics::R> relies on R to work, you need to install R first. See
210 this page for downloads, L<http://www.r-project.org/>. If R is in your PATH
211 environment variable, then it should be available from a terminal and be
212 detected automatically by I<Statistics::R>. This means that you don't have to do
213 anything on Linux systems to get I<Statistics::R> working. On Windows systems,
214 in addition to the folders described in PATH, the usual suspects will be checked
215 for the presence of the R binary, e.g. C:\Program Files\R. If I<Statistics::R>
216 does not find where R is installed, your last recourse is to specify its full
217 path when calling new():
219 my $R = Statistics::R->new( bin => $fullpath );
221 You also need to have the following CPAN Perl modules installed:
229 =item Text::Balanced (>= 1.97)
233 =item version (>= 0.77)
241 =item * L<Statistics::R::Win32>
243 =item * L<Statistics::R::Legacy>
245 =item * The R-project web site: L<http://www.r-project.org/>
247 =item * Statistics::* modules for Perl: L<http://search.cpan.org/search?query=Statistics&mode=module>
253 Florent Angly E<lt>florent.angly@gmail.comE<gt> (2011 rewrite)
255 Graciliano M. P. E<lt>gm@virtuasites.com.brE<gt> (original code)
259 Florent Angly E<lt>florent.angly@gmail.comE<gt>
261 Brian Cassidy E<lt>bricas@cpan.orgE<gt>
263 =head1 COPYRIGHT & LICENSE
265 This program is free software; you can redistribute it and/or
266 modify it under the same terms as Perl itself.
270 All complex software has bugs lurking in it, and this program is no exception.
271 If you find a bug, please report it on the CPAN Tracker of Statistics::R:
272 L<http://rt.cpan.org/Dist/Display.html?Name=Statistics-R>
274 Bug reports, suggestions and patches are welcome. The Statistics::R code is
275 developed on Github (L<http://github.com/bricas/statistics-r>) and is under Git
276 revision control. To get the latest revision, run:
278 git clone git://github.com/bricas/statistics-r.git
288 use Statistics::R::Legacy;
289 use IPC::Run qw( harness start pump finish );
290 use File::Spec::Functions qw(catfile splitpath splitdir);
291 use Text::Balanced qw ( extract_delimited extract_multiple );
293 if ( $^O =~ m/^(?:.*?win32|dos)$/i ) {
294 require Statistics::R::Win32;
297 our $VERSION = '0.34';
299 our ($SHARED_BRIDGE, $SHARED_STDIN, $SHARED_STDOUT, $SHARED_STDERR);
301 use constant DEBUG => 0; # debugging messages
302 use constant PROG => 'R'; # executable name... R
303 use constant MAXLINELEN => 1023; # maximum line length for R < 2.5
305 use constant EOS => '\\1'; # indicate the end of R output with \1
306 use constant EOS_RE => qr/[${\(EOS)}]\n$/; # regexp to match end of R stream
308 use constant NUMBER_RE => qr/^$RE{num}{real}$/; # regexp matching numbers
309 use constant BLANK_RE => qr/^\s*$/; # regexp matching whitespaces
310 use constant ILINE_RE => qr/^\s*\[\d+\] /; # regexp matching indexed line
312 my $ERROR_STR_1 = 'Error: ';
313 my $ERROR_STR_2 = 'Error in ';
314 my $ERROR_RE; # regexp matching R errors
316 my $WRAP_LINES = sub { return shift }; # function to wrap R commands
320 # Create a new R communication object
321 my ($class, %args) = @_;
323 bless $self, ref($class) || $class;
324 $self->_initialize( %args );
330 # Get (or set) the whether or not Statistics::R is setup to run in shared mode
331 my ($self, $val) = @_;
333 $self->{is_shared} = $val;
335 return $self->{is_shared};
340 no warnings 'redefine';
342 my ($self, %args) = @_;
344 if (not $self->is_started) {
346 # If shared mode option of start() requested, rebuild the bridge in shared
347 # mode. Don't use this option though. It is only here to cater for the legacy
348 # method start_shared()
349 if ( exists($args{shared}) && ($args{shared} == 1) ) {
350 $self->is_shared( 1 );
355 my $bridge = $self->_bridge;
356 $status = $bridge->start or die "Error starting ".PROG.": $?\n";
357 $self->bin( $bridge->{KIDS}->[0]->{PATH} );
358 delete $self->{died};
359 print "DBG: Started R, ".$self->bin." (pid ".$self->pid.")\n" if DEBUG;
361 # Generate regexp to catch R errors
362 if (not defined $ERROR_RE) {
363 $self->_generate_error_re;
364 $self->_localize_error_str;
365 $self->_generate_error_re;
368 # Set up a function to wrap lines for R < 2.5
369 if ( version->parse($self->version) < version->parse('2.5.0') ) {
370 print "DBG: Need to wrap to ".MAXLINELEN."\n" if DEBUG;
372 $Text::Wrap::columns = MAXLINELEN;
373 $Text::Wrap::break = ',';
374 $Text::Wrap::huge = 'overflow';
375 $Text::Wrap::separator = ",\n";
376 $WRAP_LINES = sub { return Text::Wrap::wrap('', '', shift) };
389 if ( $self->is_started ) {
390 $status = $self->_bridge->finish or die "Error stopping ".PROG.": $?\n";
391 print "DBG: Stopped R\n" if DEBUG;
399 return $self->stop && $self->start;
404 # Query whether or not R has been started and is still running - hackish.
405 # See https://rt.cpan.org/Ticket/Display.html?id=70595
408 my $bridge = $self->_bridge;
409 if (defined $bridge && not $self->{died}) {
410 if (not exists $bridge->{STATE}) {
411 die "Internal error: could not get STATE from IPC::Run\n";
413 if ($bridge->{STATE} eq IPC::Run::_started && $bridge->pumpable) {
422 # Get (or set) the PID of the running R process - hackish.
423 # See https://rt.cpan.org/Ticket/Display.html?id=70595It
424 # The PID is accessible only after the bridge has start()ed.
426 my $bridge = $self->_bridge;
427 if ( not exists $bridge->{KIDS} ) {
428 die "Internal error: could not get KIDS from IPC::Run\n";
430 if ( not exists $bridge->{KIDS}->[0]->{PID} ) {
431 die "Internal error: could not get PID from IPC::Run\n";
433 return $bridge->{KIDS}->[0]->{PID};
438 # Get or set the full path to the R binary program to use. Unless you have set
439 # the path yourself, it is accessible only after the bridge has start()ed
440 my ($self, $val) = @_;
449 # Get the version of R, e.g. '3.1.1'
451 return $self->run(q`write(paste(sep=".",R.Version()$major,R.Version()$minor), stdout())`);
456 # Pass the input and get the output
457 my ($self, @cmds) = @_;
459 # Need to start R now if it is not already running
460 $self->start if not $self->is_started;
462 # Process each command
464 for my $cmd (@cmds) {
466 # Wrap command for execution in R
467 print "DBG: Command is '$cmd'\n" if DEBUG;
468 $self->_stdin( $self->wrap_cmd($cmd) );
469 print "DBG: stdin is '".$self->_stdin."'\n" if DEBUG;
471 # Pass input to R and get its output
472 my $bridge = $self->_bridge;
473 while ( $self->_stdout !~ EOS_RE && $bridge->pumpable ) {
477 # Parse output, detect errors
478 my $out = $self->_stdout;
479 $out =~ s/${\(EOS_RE)}//;
481 my $err = $self->_stderr;
484 print "DBG: stdout is '$out'\n" if DEBUG;
485 print "DBG: stderr is '$err'\n" if DEBUG;
487 if ($err =~ $ERROR_RE) {
488 # Catch errors on stderr. Leave warnings alone.
489 print "DBG: Error\n" if DEBUG;
490 $self->{died} = 1; # for proper cleanup after failed eval
491 my $err_msg = "Error:\n".$1;
492 if ( $err_msg =~ /unrecognized escape in character string/ &&
493 version->parse($self->version) < version->parse('2.5.0') ) {
494 $err_msg .= "\nMost likely, the given R command contained lines ".
495 "exceeding ".MAXLINELEN." characters.";
499 die "Problem while running this R command:\n$cmd\n\n$err_msg\n";
502 # Save results and reinitialize
503 $results .= "\n" if $results;
504 $results .= $err.$out;
509 $self->result($results);
516 # Execute commands in given file: first, convert filepath to an R-compatible
517 # format and then pass it to source().
518 my ($self, $filepath) = @_;
519 if (not -f $filepath) {
520 die "Error: '$filepath' does not seem to exist or is not a file.\n";
524 my ($volume, $directories, $filename) = splitpath($filepath);
526 push @elems, $volume if $volume; # $volume is '' if unused
527 push @elems, splitdir($directories);
528 push @elems, $filename;
530 # Use file.path to create an R-compatible filename (bug #77761), e.g.:
531 # file <- file.path("E:", "DATA", "example.csv")
532 # Then use source() to read file and execute the commands it contains
534 my $cmd = 'source(file.path('.join(',',map {'"'.$_.'"'}@elems).'))';
535 my $results = $self->run($cmd);
542 # Get / set result of last R command
543 my ($self, $val) = @_;
545 $self->{result} = $val;
547 return $self->{result};
552 # Assign a variable or array of variables in R. Use undef if you want to
553 # assign NULL to an R variable
554 my ($self, $varname, $arr) = @_;
556 # Start R now if it is not already running
557 $self->start if not $self->is_started;
559 # Check variable type, convert everything into an arrayref
564 } elsif ($ref eq 'ARRAY') {
565 # This is an array reference, nothing to do
567 die "Error: Import variable of type $ref is not supported\n";
570 # Quote strings and nullify undef variables
571 for my $i (0 .. scalar @$arr - 1) {
572 if (defined $$arr[$i]) {
573 if ( $$arr[$i] !~ NUMBER_RE ) {
574 $$arr[$i] = _quote( $$arr[$i] );
581 # Build a variable assignment command and run it!
582 my $cmd = $varname.'<-c('.join(',',@$arr).')';
583 $cmd = &$WRAP_LINES( $cmd );
591 # Get the value of an R variable
592 my ($self, $varname) = @_;
593 my $string = $self->run(qq`print($varname)`);
597 if ($string eq 'NULL') {
599 } elsif ($string =~ ILINE_RE) {
600 # Vector: its string look like:
601 # ' [1] 6.4 13.3 4.1 1.3 14.1 10.6 9.9 9.6 15.3
602 # [16] 5.2 10.9 14.4'
603 my @lines = split /\n/, $string;
604 for my $i (0 .. scalar @lines - 1) {
605 $lines[$i] =~ s/${\(ILINE_RE)}//;
607 $value = join ' ', @lines;
609 my @lines = split /\n/, $string;
610 if (scalar @lines == 2) {
611 # String looks like: ' mean
613 # Extract value from second line
614 $value = _trim( $lines[1] );
622 if (not defined $value) {
625 # Split string into an array, paying attention to strings containing spaces:
626 # extract_delim should be enough but we use extract_delim + split because
627 # of Text::Balanced bug #73416
628 if ($value =~ m{['"]}) {
629 @arr = extract_multiple( $value, [sub { extract_delimited($_[0],q{'"}) },] );
631 for my $i (0 .. scalar @arr - 1) {
633 if ($arr[$i] =~ BLANK_RE) {
634 # Remove elements that are simply whitespaces later, in a single operation
638 $arr[$i-$nof_empty] = _unquote( _trim($elem) );
641 if ($nof_empty > 0) {
642 splice @arr, -$nof_empty, $nof_empty;
645 @arr = split( /\s+/, _trim($value) );
649 # Return either a scalar of an arrayref
651 if (scalar @arr == 1) {
661 #---------- INTERNAL METHODS --------------------------------------------------#
665 my ($self, %args) = @_;
667 # Full path of R binary specified by bin (r_bin or R_bin for backward
668 # compatibility), or executable name (IPC::Run will find its full path later)
669 $self->bin( $args{bin} || $args{r_bin} || $args{R_bin} || PROG );
672 if ( exists $args{shared} && $args{shared} == 1 ) {
673 $self->is_shared( 1 );
675 $self->is_shared( 0 );
686 # Get or build the communication bridge and IOs with R
687 my ($self, $build) = @_;
688 my %params = ( debug => 0 );
690 my $cmd = [ $self->bin, '--vanilla', '--slave' ];
691 if (not $self->is_shared) {
692 my ($stdin, $stdout, $stderr);
693 $self->{stdin} = \$stdin;
694 $self->{stdout} = \$stdout;
695 $self->{stderr} = \$stderr;
696 $self->{bridge} = harness $cmd, $self->{stdin}, $self->{stdout}, $self->{stderr}, %params;
698 $self->{stdin} = \$SHARED_STDIN ;
699 $self->{stdout} = \$SHARED_STDOUT;
700 $self->{stderr} = \$SHARED_STDERR;
701 if (not defined $SHARED_BRIDGE) {
702 # The first Statistics::R instance builds the bridge
703 $SHARED_BRIDGE = harness $cmd, $self->{stdin}, $self->{stdout}, $self->{stderr}, %params;
705 $self->{bridge} = $SHARED_BRIDGE;
708 return $self->{bridge};
713 # Get / set standard input string for R
714 my ($self, $val) = @_;
716 ${$self->{stdin}} = $val;
718 return ${$self->{stdin}};
723 # Get / set standard output string for R
724 my ($self, $val) = @_;
726 ${$self->{stdout}} = $val;
728 return ${$self->{stdout}};
733 # Get / set standard error string for R
734 my ($self, $val) = @_;
736 ${$self->{stderr}} = $val;
738 return ${$self->{stderr}};
743 # Wrap a command to pass to R. Whether the command is successful or not, the
744 # end of stream string will appear on stdout and indicate that R has finished
745 # processing the data. Note that $cmd can be multiple R commands.
746 my ($self, $cmd) = @_;
749 $cmd .= qq`; write("`.EOS.qq`",stdout())\n`;
754 sub _generate_error_re {
755 # Generate a regular expression to catch R internal errors, e.g.:
756 # Error: object 'zzz' not found"
757 # Error in print(ASDF) : object 'ASDF' not found
759 $ERROR_RE = qr/^(?:$ERROR_STR_1|$ERROR_STR_2)\s*(.*)$/s;
760 print "DBG: Regexp for catching errors is '$ERROR_RE'\n" if DEBUG;
765 sub _localize_error_str {
766 # Find the translation for the R error strings. Internationalization is
767 # present in R >=2.1, with Natural Language Support enabled.
770 for my $error_str ($ERROR_STR_1, $ERROR_STR_2) {
771 my $cmd = qq`write(ngettext(1, "$error_str", "", domain="R"), stdout())`;
772 $self->set('cmd', $cmd);
773 # Try to translate string, return '' if not possible
774 my $str = $self->run(q`tryCatch( eval(parse(text=cmd)) , error=function(e){write("",stdout())} )`);
778 ($ERROR_STR_1, $ERROR_STR_2) = @strings;
784 # The bridge to R is not automatically bombed when Statistics::R instances
785 # get out of scope. Do it now (unless running in shared mode)!
787 if (not $self->is_shared) {
793 #---------- HELPER SUBS -------------------------------------------------------#
797 # Remove flanking whitespaces
806 # Quote a string for use in R. We use double-quotes because the documentation
807 # Quotes {base} R documentation states that this is preferred over single-
808 # quotes. Double-quotes inside the string are escaped.
810 # Escape " by \" , \" by \\\" , ...
811 $str =~ s/ (\\*) " / '\\' x (2*length($1)+1) . '"' /egx;
821 # Remove surrounding "
824 # Interpolate (de-escape) \\\" to \" , \" to " , ...
825 $str =~ s/ ((?:\\\\)*) \\ " / '\\' x (length($1)*0.5) . '"' /egx;