* Precompiled regression tests::
* Compiling regression tests::
* Regtest comparison::
+* Pixel-based regtest comparison::
* Finding the cause of a regression::
* Memory and coverage tests::
* MusicXML tests::
-* Grand Regression Test Checking::
@end menu
The test comparison shows all of the changes that occurred between
the current release and the prior release. Each test that has a
-significant difference in output is displayed, with the old
-version on the left and the new version on the right.
+significant (noticeable) difference in output is displayed, with
+the old version on the left and the new version on the right.
+
+Some of the small changes can be ignored (slightly different slur
+shapes, small variations in note spacing), but this is not always
+the case: sometimes even the smallest change means that something
+is wrong. To help in distinguishing these cases, we use bigger
+staff size when small differences matter.
+
+Staff size 30 generally means "pay extra attention to details".
+Staff size 40 (two times bigger than default size) or more means
+that the regtest @strong{is} about the details.
+
+Staff size smaller than default doesn't mean anything.
Regression tests whose output is the same for both versions are
not shown in the test comparison.
@item
Profile files: give information about
TODO? I don't know what they're for.
+Apparently they give some information about CPU usage. If you got
+tons of changes in cell counts, this probably means that you compiled
+@code{make test-baseline} with a different amount of CPU threads than
+@code{make check}. Try redoing tests from scratch with the same
+number of threads each time -- see @ref{Saving time with the -j option}.
@end itemize
@node Regtest comparison
@section Regtest comparison
-Before modified code is committed to master, a regression test
+Before modified code is committed to @code{master} (via @code{staging}),
+a regression test
comparison must be completed to ensure that the changes have
not caused problems with previously working code. The comparison
is made automatically upon compiling the regression test suite
@item
Before making changes to the code, establish a baseline for the comparison by
-going to the @file{lilypond-git/build/} directory and running:
+going to the @file{$LILYPOND_GIT/build/} directory and running:
@example
make test-baseline
@code{make test-clean}, @code{make} and @code{make check} it without doing
@code{make test-baseline} again.}
+@node Pixel-based regtest comparison
+@section Pixel-based regtest comparison
+
+As an alternative to the @code{make test} method for regtest checking (which
+relies upon @code{.signature} files created by a LilyPond run and which describe
+the placing of grobs) there is a script which compares the output of two
+LilyPond versions pixel-by-pixel. To use this, start by checking out the
+version of LilyPond you want to use as a baseline, and run @code{make}. Then,
+do the following:
+
+@example
+cd $LILYPOND_GIT/scripts/auxiliar/
+./make-regtest-pngs.sh -j9 -o
+@end example
+
+The @code{-j9} option tells the script to use 9 CPUs to create the
+images - change this to your own CPU count+1. @code{-o} means this is the "old"
+version. This will create images of all the regtests in
+
+@example
+$LILYPOND_BUILD_DIR/out-png-check/old-regtest-results/
+@end example
+
+Now checkout the version you want to compare with the baseline. Run
+@code{make} again to recreate the LilyPond binary. Then, do the following:
+
+@example
+cd $LILYPOND_GIT/scripts/auxiliar/
+./make-regtest-pngs.sh -j9 -n
+@end example
+
+The @code{-n} option tells the script to make a "new" version of the
+images. They are created in
+
+@example
+$LILYPOND_BUILD_DIR/out-png-check/new-regtest-results/
+@end example
+
+Once the new images have been created, the script compares the old images with
+the new ones pixel-by-pixel and prints a list of the different images to the
+terminal, together with a count of how many differences were found. The
+results of the checks are in
+
+@example
+$LILYPOND_BUILD_DIR/out-png-check/regtest-diffs/
+@end example
+
+To check for differences, browse that directory with an image
+viewer. Differences are shown in red. Be aware that some images with complex
+fonts or spacing annotations always display a few minor differences. These can
+safely be ignored.
+
@node Finding the cause of a regression
@section Finding the cause of a regression
For tracking memory usage as part of this test, you will need
GUILE CVS; especially the following patch:
@smallexample
-@uref{http://www.lilypond.org/vc/old/gub.darcs/patches/guile-1.9-gcstats.patch}.
+@uref{http://lilypond.org/vc/old/gub.darcs/patches/guile-1.9-gcstats.patch}.
@end smallexample
@subheading Code coverage
@uref{http://lilypond.org/doc/latest/input/regression/musicxml/collated-files}
@end example
-
-@node Grand Regression Test Checking
-@section Grand Regression Test Checking
-
-@subheading What is this all about?
-
-Regression tests (usually abbreviated "regtests") is a collection
-of @file{.ly} files used to check whether LilyPond is working correctly.
-Example: before version 2.15.12 breve noteheads had incorrect width,
-which resulted in collisions with other objects. After the issue was fixed,
-a small @file{.ly} file demonstrating the problem was added to the regression
-tests as a proof that the fix works. If someone will accidentally break
-breve width again, we will notice this in the output of that regression test.
-
-We are asking you to help us by checking a regtest or two from time to time.
-You don't need programming skills to do this, not even LilyPond skills -
-just basic music notation knowledge; checking one regtest takes less than
-a minute. Simply go here:
-
-@example
-@uref{http://www.philholmes.net/lilypond/regtests/}
-@end example
-
-@subheading Some tips on checking regtests
-
-@subsubheading Description text
-
-The description should be clear even for a music beginner.
-If there are any special terms used in the description,
-they all should be explained in our @rglosnamed{Top, Music Glossary}
-or @rinternalsnamed{Top, Internals Reference}.
-Vague descriptions (like "behaves well", "looks reasonable") shouldn't be used.
-
-@ignore
-@subsubheading Is regtest straightforward and systematic?
-
-Unfortunately some regtests are written poorly. A good regtest should be
-straightforward: it should be obvious what it checks and how. Also, it
-usually shouldn't check everything at once. For example it's a bad idea to test
-accidental placement by constucting one huge chord with many suspended notes
-and loads of accidentals. It's better to divide such problem into a series
-of clearly separated cases.
-@end ignore