1 @c -*- coding: utf-8; mode: texinfo; -*-
3 @chapter Regression tests
6 * Introduction to regression tests::
7 * Precompiled regression tests::
8 * Compiling regression tests::
10 * Finding the cause of a regression::
11 * Memory and coverage tests::
13 * Grand Regression Test Checking::
17 @node Introduction to regression tests
18 @section Introduction to regression tests
20 LilyPond has a complete suite of regression tests that are used
21 to ensure that changes to the code do not break existing behavior.
22 These regression tests comprise small LilyPond snippets that test
23 the functionality of each part of LilyPond.
25 Regression tests are added when new functionality is added to
27 We do not yet have a policy on when it is appropriate to add or
28 modify a regtest when bugs are fixed. Individual developers
29 should use their best judgement until this is clarified during the
30 @ref{Grand Organization Project (GOP)}.
32 The regression tests are compiled using special @code{make}
33 targets. There are three primary uses for the regression
34 tests. First, successful completion of the regression tests means
35 that LilyPond has been properly built. Second, the output of the
36 regression tests can be manually checked to ensure that
37 the graphical output matches the description of the intended
38 output. Third, the regression test output from two different
39 versions of LilyPond can be automatically compared to identify
40 any differences. These differences should then be manually
41 checked to ensure that the differences are intended.
43 Regression tests (@qq{regtests}) are available in precompiled form
44 as part of the documentation. Regtests can also be compiled
45 on any machine that has a properly configured LilyPond build
49 @node Precompiled regression tests
50 @section Precompiled regression tests
52 @subheading Regression test output
54 As part of the release process, the regression tests are run
55 for every LilyPond release. Full regression test output is
56 available for every stable version and the most recent development
59 Regression test output is available in HTML and PDF format. Links
60 to the regression test output are available at the developer's
61 resources page for the version of interest.
63 The latest stable version of the regtests is found at:
66 @uref{http://lilypond.org/doc/stable/input/regression/collated-files.html}
69 The latest development version of the regtests is found at:
72 @uref{http://lilypond.org/doc/latest/input/regression/collated-files.html}
76 @subheading Regression test comparison
78 Each time a new version is released, the regtests are
79 compiled and the output is automatically compared with the
80 output of the previous release. The result of these
81 comparisons is archived online:
84 @uref{http://lilypond.org/test/}
87 Checking these pages is a very important task for the LilyPond project.
88 You are invited to report anything that looks broken, or any case
89 where the output quality is not on par with the previous release,
90 as described in @rweb{Bug reports}.
92 @warning{ The special regression test
93 @file{test-output-distance.ly} will always show up as a
94 regression. This test changes each time it is run, and serves to
95 verify that the regression tests have, in fact, run.}
98 @subheading What to look for
100 The test comparison shows all of the changes that occurred between
101 the current release and the prior release. Each test that has a
102 significant (noticeable) difference in output is displayed, with
103 the old version on the left and the new version on the right.
105 Some of the small changes can be ignored (slightly different slur
106 shapes, small variations in note spacing), but this is not always
107 the case: sometimes even the smallest change means that something
108 is wrong. To help in distinguishing these cases, we use bigger
109 staff size when small differences matter.
111 Staff size 30 generally means "pay extra attention to details".
112 Staff size 40 (two times bigger than default size) or more means
113 that the regtest @strong{is} about the details.
115 Staff size smaller than default doesn't mean anything.
117 Regression tests whose output is the same for both versions are
118 not shown in the test comparison.
122 Images: green blurs in the new version show the approximate
123 location of elements in the old version.
125 There are often minor adjustments in spacing which do not indicate
129 Log files: show the difference in command-line output.
131 The main thing to examine are any changes in page counts -- if a
132 file used to fit on 1 page but now requires 4 or 5 pages,
133 something is suspicious!
136 Profile files: give information about
137 TODO? I don't know what they're for.
142 The automatic comparison of the regtests checks the LilyPond
143 bounding boxes. This means that Ghostscript changes and changes
144 in lyrics or text are not found.
147 @node Compiling regression tests
148 @section Compiling regression tests
150 Developers may wish to see the output of the complete regression
151 test suite for the current version of the source repository
152 between releases. Current source code is available; see
153 @ref{Working with source code}.
155 For regression testing @code{../configure} should be run with the
156 @code{--disable-optimising} option. Then you will need
157 to build the LilyPond binary; see @ref{Compiling LilyPond}.
159 Uninstalling the previous LilyPond version is not necessary, nor is
160 running @code{make install}, since the tests will automatically be
161 compiled with the LilyPond binary you have just built in your source
164 From this point, the regtests are compiled with:
170 If you have a multi-core machine you may want to use the @option{-j}
171 option and @var{CPU_COUNT} variable, as
172 described in @ref{Saving time with CPU_COUNT}.
173 For a quad-core processor the complete command would be:
176 make -j5 CPU_COUNT=5 test
179 The regtest output will then be available in
180 @file{input/regression/out-test}.
181 @file{input/regression/out-test/collated-examples.html}
182 contains a listing of all the regression tests that were run,
183 but none of the images are included. Individual images are
184 also available in this directory.
186 The primary use of @samp{make@tie{}test} is to verify that the
187 regression tests all run without error. The regression test
188 page that is part of the documentation is created only when the
189 documentation is built, as described in @ref{Generating documentation}.
190 Note that building the documentation requires more installed components
191 than building the source code, as described in
192 @ref{Requirements for building documentation}.
195 @node Regtest comparison
196 @section Regtest comparison
198 Before modified code is committed to @code{master} (via @code{staging}),
200 comparison must be completed to ensure that the changes have
201 not caused problems with previously working code. The comparison
202 is made automatically upon compiling the regression test suite
208 Run @code{make} with current git master without any of your changes.
211 Before making changes to the code, establish a baseline for the comparison by
212 going to the @file{$LILYPOND_GIT/build/} directory and running:
219 Make your changes, or apply the patch(es) to consider.
222 Compile the source with @samp{make} as usual.
225 Check for unintentional changes to the regtests:
231 After this has finished, a regression test comparison will be
232 available (relative to the current @file{build/} directory) at:
235 out/test-results/index.html
238 For each regression test that differs between the baseline and the
239 changed code, a regression test entry will be displayed. Ideally,
240 the only changes would be the changes that you were working on.
241 If regressions are introduced, they must be fixed before
245 The special regression test @file{test-output-distance.ly} will always
246 show up as a regression. This test changes each time it is run, and
247 serves to verify that the regression tests have, in fact, run.}
250 If you are happy with the results, then stop now.
252 If you want to continue programming, then make any additional code
253 changes, and continue.
256 Compile the source with @samp{make} as usual.
259 To re-check files that differed between the initial
260 @samp{make@tie{}test-baseline} and your post-changes
261 @samp{make@tie{}check}, run:
267 This updates the regression list at @file{out/test-results/index.html}.
268 It does @emph{not} redo @file{test-output-distance.ly}.
271 When all regressions have been resolved, the output list will be empty.
274 Once all regressions have been resolved, a final check should be completed
282 This cleans the results of the previous @samp{make@tie{}check}, then does the
283 automatic regression comparison again.
288 Once a test baseline has been established, there is no need to run it again
289 unless git master changed. In other words, if you work with several branches
290 and want to do regtests comparison for all of them, you can
291 @code{make test-baseline} with git master, checkout some branch,
292 @code{make} and @code{make check} it, then switch to another branch,
293 @code{make test-clean}, @code{make} and @code{make check} it without doing
294 @code{make test-baseline} again.}
297 @node Finding the cause of a regression
298 @section Finding the cause of a regression
300 Git has special functionality to help tracking down the exact
301 commit which causes a problem. See the git manual page for
302 @code{git bisect}. This is a job that non-programmers can do,
303 although it requires familiarity with git, ability to compile
304 LilyPond, and generally a fair amount of technical knowledge. A
305 brief summary is given below, but you may need to consult other
306 documentation for in-depth explanations.
308 Even if you are not familiar with git or are not able to compile
309 LilyPond you can still help to narrow down the cause of a
310 regression simply by downloading the binary releases of different
311 LilyPond versions and testing them for the regression. Knowing
312 which version of LilyPond first exhibited the regression is
313 helpful to a developer as it shortens the @code{git bisect}
316 Once a problematic commit is identified, the programmers' job is
317 much easier. In fact, for most regression bugs, the majority of
318 the time is spent simply finding the problematic commit.
320 More information is in @ref{Regression tests}.
322 @subheading git bisect setup
324 We need to set up the bisect for each problem we want to
327 Suppose we have an input file which compiled in version 2.13.32,
328 but fails in version 2.13.38 and above.
339 Give it the earliest known bad tag:
342 git bisect bad release/2.13.38-1
345 (you can see tags with: @code{git tag} )
348 Give it the latest known good tag:
351 git bisect good release/2.13.32-1
354 You should now see something like:
356 Bisecting: 195 revisions left to test after this (roughly 8 steps)
357 [b17e2f3d7a5853a30f7d5a3cdc6b5079e77a3d2a] Web: Announcement
358 update for the new @qq{LilyPond Report}.
363 @subheading git bisect actual
375 Test your input file:
378 out/bin/lilypond test.ly
386 Does it crash, or is the output bad? If so:
393 Does your input file produce good output? If so:
402 Once the exact problem commit has been identified, git will inform
403 you with a message like:
406 6d28aebbaaab1be9961a00bf15a1ef93acb91e30 is the first bad commit
407 %%% ... blah blah blah ...
410 If there is still a range of commits, then git will automatically
411 select a new version for you to test. Go to step #1.
415 @subheading Recommendation: use two terminal windows
419 One window is open to the @code{build/} directory, and alternates
420 between these commands:
424 out/bin/lilypond test.ly
428 One window is open to the top source directory, and alternates
429 between these commands:
439 @node Memory and coverage tests
440 @section Memory and coverage tests
442 In addition to the graphical output of the regression tests, it is
443 possible to test memory usage and to determine how much of the source
444 code has been exercised by the tests.
446 @subheading Memory usage
448 For tracking memory usage as part of this test, you will need
449 GUILE CVS; especially the following patch:
451 @uref{http://www.lilypond.org/vc/old/gub.darcs/patches/guile-1.9-gcstats.patch}.
454 @subheading Code coverage
456 For checking the coverage of the test suite, do the following
459 ./scripts/auxiliar/build-coverage.sh
460 @emph{# uncovered files, least covered first}
461 ./scripts/auxiliar/coverage.py --summary out-cov/*.cc
462 @emph{# consecutive uncovered lines, longest first}
463 ./scripts/auxiliar/coverage.py --uncovered out-cov/*.cc
468 @section MusicXML tests
471 LilyPond comes with a complete set of regtests for the
472 @uref{http://www.musicxml.org/,MusicXML} language. Originally
473 developed to test @samp{musicxml2ly}, these regression tests
474 can be used to test any MusicXML implementation.
476 The MusicXML regression tests are found at
477 @file{input/regression/musicxml/}.
479 The output resulting from running these tests
480 through @samp{musicxml2ly} followed by @samp{lilypond} is
481 available in the LilyPond documentation:
484 @uref{http://lilypond.org/doc/latest/input/regression/musicxml/collated-files}
488 @node Grand Regression Test Checking
489 @section Grand Regression Test Checking
491 @subheading What is this all about?
493 Regression tests (usually abbreviated "regtests") is a collection
494 of @file{.ly} files used to check whether LilyPond is working correctly.
495 Example: before version 2.15.12 breve noteheads had incorrect width,
496 which resulted in collisions with other objects. After the issue was fixed,
497 a small @file{.ly} file demonstrating the problem was added to the regression
498 tests as a proof that the fix works. If someone will accidentally break
499 breve width again, we will notice this in the output of that regression test.
501 @subheading How can I help?
503 We ask you to help us by checking one or two regtests from time to time.
504 You don't need programming skills to do this, not even LilyPond skills -
505 just basic music notation knowledge; checking one regtest takes less than
506 a minute. Simply go here:
509 @uref{http://www.philholmes.net/lilypond/regtests/}
512 @subheading Some tips on checking regtests
514 @subsubheading Description text
516 The description should be clear even for a music beginner.
517 If there are any special terms used in the description,
518 they all should be explained in our @rglosnamed{Top, Music Glossary}
519 or @rinternalsnamed{Top, Internals Reference}.
520 Vague descriptions (like "behaves well", "looks reasonable") shouldn't be used.
523 this may be useful for advanced regtest checking
524 @subsubheading Is regtest straightforward and systematic?
526 Unfortunately some regtests are written poorly. A good regtest should be
527 straightforward: it should be obvious what it checks and how. Also, it
528 usually shouldn't check everything at once. For example it's a bad idea to test
529 accidental placement by constucting one huge chord with many suspended notes
530 and loads of accidentals. It's better to divide such problem into a series
531 of clearly separated cases.