Documentation/contributor/regressions.itexi

   1 @c -*- coding: utf-8; mode: texinfo; -*-
   2 @node Regression tests
   3 @chapter Regression tests
   4
   5 @menu
   6 * Introduction to regression tests::
   7 * Precompiled regression tests::
   8 * Compiling regression tests::
   9 * Regtest comparison::
  10 * Pixel-based regtest comparison::
  11 * Finding the cause of a regression::
  12 * Memory and coverage tests::
  13 * MusicXML tests::
  14 @end menu
  15
  16
  17 @node Introduction to regression tests
  18 @section Introduction to regression tests
  19
  20 LilyPond has a complete suite of regression tests that are used
  21 to ensure that changes to the code do not break existing behavior.
  22 These regression tests comprise small LilyPond snippets that test
  23 the functionality of each part of LilyPond.
  24
  25 Regression tests are added when new functionality is added to
  26 LilyPond.
  27 We do not yet have a policy on when it is appropriate to add or
  28 modify a regtest when bugs are fixed.  Individual developers
  29 should use their best judgement until this is clarified during the
  30 @ref{Grand Organization Project (GOP)}.
  31
  32 The regression tests are compiled using special @code{make}
  33 targets.  There are three primary uses for the regression
  34 tests.  First, successful completion of the regression tests means
  35 that LilyPond has been properly built.  Second, the output of the
  36 regression tests can be manually checked to ensure that
  37 the graphical output matches the description of the intended
  38 output.  Third, the regression test output from two different
  39 versions of LilyPond can be automatically compared to identify
  40 any differences.  These differences should then be manually
  41 checked to ensure that the differences are intended.
  42
  43 Regression tests (@qq{regtests}) are available in precompiled form
  44 as part of the documentation.  Regtests can also be compiled
  45 on any machine that has a properly configured LilyPond build
  46 system.
  47
  48
  49 @node Precompiled regression tests
  50 @section Precompiled regression tests
  51
  52 @subheading Regression test output
  53
  54 As part of the release process, the regression tests are run
  55 for every LilyPond release.  Full regression test output is
  56 available for every stable version and the most recent development
  57 version.
  58
  59 Regression test output is available in HTML and PDF format.  Links
  60 to the regression test output are available at the developer's
  61 resources page for the version of interest.
  62
  63 The latest stable version of the regtests is found at:
  64
  65 @example
  66 @uref{http://lilypond.org/doc/stable/input/regression/collated-files.html}
  67 @end example
  68
  69 The latest development version of the regtests is found at:
  70
  71 @example
  72 @uref{http://lilypond.org/doc/latest/input/regression/collated-files.html}
  73 @end example
  74
  75
  76 @subheading Regression test comparison
  77
  78 Each time a new version is released, the regtests are
  79 compiled and the output is automatically compared with the
  80 output of the previous release.  The result of these
  81 comparisons is archived online:
  82
  83 @example
  84 @uref{http://lilypond.org/test/}
  85 @end example
  86
  87 Checking these pages is a very important task for the LilyPond project.
  88 You are invited to report anything that looks broken, or any case
  89 where the output quality is not on par with the previous release,
  90 as described in @rweb{Bug reports}.
  91
  92 @warning{ The special regression test
  93 @file{test-output-distance.ly} will always show up as a
  94 regression.  This test changes each time it is run, and serves to
  95 verify that the regression tests have, in fact, run.}
  96
  97
  98 @subheading What to look for
  99
 100 The test comparison shows all of the changes that occurred between
 101 the current release and the prior release.  Each test that has a
 102 significant (noticeable) difference in output is displayed, with
 103 the old version on the left and the new version on the right.
 104
 105 Some of the small changes can be ignored (slightly different slur
 106 shapes, small variations in note spacing), but this is not always
 107 the case: sometimes even the smallest change means that something
 108 is wrong.  To help in distinguishing these cases, we use bigger
 109 staff size when small differences matter.
 110
 111 Staff size 30 generally means "pay extra attention to details".
 112 Staff size 40 (two times bigger than default size) or more means
 113 that the regtest @strong{is} about the details.
 114
 115 Staff size smaller than default doesn't mean anything.
 116
 117 Regression tests whose output is the same for both versions are
 118 not shown in the test comparison.
 119
 120 @itemize
 121 @item
 122 Images: green blurs in the new version show the approximate
 123 location of elements in the old version.
 124
 125 There are often minor adjustments in spacing which do not indicate
 126 any problem.
 127
 128 @item
 129 Log files: show the difference in command-line output.
 130
 131 The main thing to examine are any changes in page counts -- if a
 132 file used to fit on 1 page but now requires 4 or 5 pages,
 133 something is suspicious!
 134
 135 @item
 136 Profile files: give information about
 137 TODO?  I don't know what they're for.
 138 Apparently they give some information about CPU usage.  If you got
 139 tons of changes in cell counts, this probably means that you compiled
 140 @code{make test-baseline} with a different amount of CPU threads than
 141 @code{make check}. Try redoing tests from scratch with the same
 142 number of threads each time -- see @ref{Saving time with the -j option}.
 143
 144 @end itemize
 145
 146 @warning{
 147 The automatic comparison of the regtests checks the LilyPond
 148 bounding boxes.  This means that Ghostscript changes and changes
 149 in lyrics or text are not found.
 150 }
 151
 152 @node Compiling regression tests
 153 @section Compiling regression tests
 154
 155 Developers may wish to see the output of the complete regression
 156 test suite for the current version of the source repository
 157 between releases.  Current source code is available; see
 158 @ref{Working with source code}.
 159
 160 For regression testing @code{../configure} should be run with the
 161 @code{--disable-optimising} option.  Then you will need
 162 to build the LilyPond binary; see @ref{Compiling LilyPond}.
 163
 164 Uninstalling the previous LilyPond version is not necessary, nor is
 165 running @code{make install}, since the tests will automatically be
 166 compiled with the LilyPond binary you have just built in your source
 167 directory.
 168
 169 From this point, the regtests are compiled with:
 170
 171 @example
 172 make test
 173 @end example
 174
 175 If you have a multi-core machine you may want to use the @option{-j}
 176 option and @var{CPU_COUNT} variable, as
 177 described in @ref{Saving time with CPU_COUNT}.
 178 For a quad-core processor the complete command would be:
 179
 180 @example
 181 make -j5 CPU_COUNT=5 test
 182 @end example
 183
 184 The regtest output will then be available in
 185 @file{input/regression/out-test}.
 186 @file{input/regression/out-test/collated-examples.html}
 187 contains a listing of all the regression tests that were run,
 188 but none of the images are included.  Individual images are
 189 also available in this directory.
 190
 191 The primary use of @samp{make@tie{}test} is to verify that the
 192 regression tests all run without error.  The regression test
 193 page that is part of the documentation is created only when the
 194 documentation is built, as described in @ref{Generating documentation}.
 195 Note that building the documentation requires more installed components
 196 than building the source code, as described in
 197 @ref{Requirements for building documentation}.
 198
 199
 200 @node Regtest comparison
 201 @section Regtest comparison
 202
 203 Before modified code is committed to @code{master} (via @code{staging}),
 204 a regression test
 205 comparison must be completed to ensure that the changes have
 206 not caused problems with previously working code.  The comparison
 207 is made automatically upon compiling the regression test suite
 208 twice.
 209
 210 @enumerate
 211
 212 @item
 213 Run @code{make} with current git master without any of your changes.
 214
 215 @item
 216 Before making changes to the code, establish a baseline for the comparison by
 217 going to the @file{$LILYPOND_GIT/build/} directory and running:
 218
 219 @example
 220 make test-baseline
 221 @end example
 222
 223 @item
 224 Make your changes, or apply the patch(es) to consider.
 225
 226 @item
 227 Compile the source with @samp{make} as usual.
 228
 229 @item
 230 Check for unintentional changes to the regtests:
 231
 232 @example
 233 make check
 234 @end example
 235
 236 After this has finished, a regression test comparison will be
 237 available (relative to the current @file{build/} directory) at:
 238
 239 @example
 240 out/test-results/index.html
 241 @end example
 242
 243 For each regression test that differs between the baseline and the
 244 changed code, a regression test entry will be displayed.  Ideally,
 245 the only changes would be the changes that you were working on.
 246 If regressions are introduced, they must be fixed before
 247 committing the code.
 248
 249 @warning{
 250 The special regression test @file{test-output-distance.ly} will always
 251 show up as a regression.  This test changes each time it is run, and
 252 serves to verify that the regression tests have, in fact, run.}
 253
 254 @item
 255 If you are happy with the results, then stop now.
 256
 257 If you want to continue programming, then make any additional code
 258 changes, and continue.
 259
 260 @item
 261 Compile the source with @samp{make} as usual.
 262
 263 @item
 264 To re-check files that differed between the initial
 265 @samp{make@tie{}test-baseline} and your post-changes
 266 @samp{make@tie{}check}, run:
 267
 268 @example
 269 make test-redo
 270 @end example
 271
 272 This updates the regression list at @file{out/test-results/index.html}.
 273 It does @emph{not} redo @file{test-output-distance.ly}.
 274
 275 @item
 276 When all regressions have been resolved, the output list will be empty.
 277
 278 @item
 279 Once all regressions have been resolved, a final check should be completed
 280 by running:
 281
 282 @example
 283 make test-clean
 284 make check
 285 @end example
 286
 287 This cleans the results of the previous @samp{make@tie{}check}, then does the
 288 automatic regression comparison again.
 289
 290 @end enumerate
 291
 292 @advanced{
 293 Once a test baseline has been established, there is no need to run it again
 294 unless git master changed. In other words, if you work with several branches
 295 and want to do regtests comparison for all of them, you can
 296 @code{make test-baseline} with git master, checkout some branch,
 297 @code{make} and @code{make check} it, then switch to another branch,
 298 @code{make test-clean}, @code{make} and @code{make check} it without doing
 299 @code{make test-baseline} again.}
 300
 301 @node Pixel-based regtest comparison
 302 @section Pixel-based regtest comparison
 303
 304 As an alternative to the @code{make test} method for regtest checking (which
 305 relies upon @code{.signature} files created by a LilyPond run and which describe
 306 the placing of grobs) there is a script which compares the output of two
 307 LilyPond versions pixel-by-pixel.  To use this, start by checking out the
 308 version of LilyPond you want to use as a baseline, and run @code{make}.  Then,
 309 do the following:
 310
 311 @example
 312 cd $LILYPOND_GIT/scripts/auxiliar/
 313 ./make-regtest-pngs.sh -j9 -o
 314 @end example
 315
 316 The @code{-j9} option tells the script to use 9 CPUs to create the
 317 images - change this to your own CPU count+1.  @code{-o} means this is the "old"
 318 version.  This will create images of all the regtests in
 319
 320 @example
 321 $LILYPOND_BUILD_DIR/out-png-check/old-regtest-results/
 322 @end example
 323
 324 Now checkout the version you want to compare with the baseline.  Run
 325 @code{make} again to recreate the LilyPond binary.  Then, do the following:
 326
 327 @example
 328 cd $LILYPOND_GIT/scripts/auxiliar/
 329 ./make-regtest-pngs.sh -j9 -n
 330 @end example
 331
 332 The @code{-n} option tells the script to make a "new" version of the
 333 images.  They are created in
 334
 335 @example
 336 $LILYPOND_BUILD_DIR/out-png-check/new-regtest-results/
 337 @end example
 338
 339 Once the new images have been created, the script compares the old images with
 340 the new ones pixel-by-pixel and prints a list of the different images to the
 341 terminal, together with a count of how many differences were found.  The
 342 results of the checks are in
 343
 344 @example
 345 $LILYPOND_BUILD_DIR/out-png-check/regtest-diffs/
 346 @end example
 347
 348 To check for differences, browse that directory with an image
 349 viewer.  Differences are shown in red.  Be aware that some images with complex
 350 fonts or spacing annotations always display a few minor differences.  These can
 351 safely be ignored.
 352
 353
 354 @node Finding the cause of a regression
 355 @section Finding the cause of a regression
 356
 357 Git has special functionality to help tracking down the exact
 358 commit which causes a problem.  See the git manual page for
 359 @code{git bisect}.  This is a job that non-programmers can do,
 360 although it requires familiarity with git, ability to compile
 361 LilyPond, and generally a fair amount of technical knowledge.  A
 362 brief summary is given below, but you may need to consult other
 363 documentation for in-depth explanations.
 364
 365 Even if you are not familiar with git or are not able to compile
 366 LilyPond you can still help to narrow down the cause of a
 367 regression simply by downloading the binary releases of different
 368 LilyPond versions and testing them for the regression.  Knowing
 369 which version of LilyPond first exhibited the regression is
 370 helpful to a developer as it shortens the @code{git bisect}
 371 procedure.
 372
 373 Once a problematic commit is identified, the programmers' job is
 374 much easier.  In fact, for most regression bugs, the majority of
 375 the time is spent simply finding the problematic commit.
 376
 377 More information is in @ref{Regression tests}.
 378
 379 @subheading git bisect setup
 380
 381 We need to set up the bisect for each problem we want to
 382 investigate.
 383
 384 Suppose we have an input file which compiled in version 2.13.32,
 385 but fails in version 2.13.38 and above.
 386
 387 @enumerate
 388 @item
 389 Begin the process:
 390
 391 @example
 392 git bisect start
 393 @end example
 394
 395 @item
 396 Give it the earliest known bad tag:
 397
 398 @example
 399 git bisect bad release/2.13.38-1
 400 @end example
 401
 402 (you can see tags with: @code{git tag} )
 403
 404 @item
 405 Give it the latest known good tag:
 406
 407 @example
 408 git bisect good release/2.13.32-1
 409 @end example
 410
 411 You should now see something like:
 412 @example
 413 Bisecting: 195 revisions left to test after this (roughly 8 steps)
 414 [b17e2f3d7a5853a30f7d5a3cdc6b5079e77a3d2a] Web: Announcement
 415 update for the new @qq{LilyPond Report}.
 416 @end example
 417
 418 @end enumerate
 419
 420 @subheading git bisect actual
 421
 422 @enumerate
 423
 424 @item
 425 Compile the source:
 426
 427 @example
 428 make
 429 @end example
 430
 431 @item
 432 Test your input file:
 433
 434 @example
 435 out/bin/lilypond test.ly
 436 @end example
 437
 438 @item
 439 Test results?
 440
 441 @itemize
 442 @item
 443 Does it crash, or is the output bad?  If so:
 444
 445 @example
 446 git bisect bad
 447 @end example
 448
 449 @item
 450 Does your input file produce good output?  If so:
 451
 452 @example
 453 git bisect good
 454 @end example
 455
 456 @end itemize
 457
 458 @item
 459 Once the exact problem commit has been identified, git will inform
 460 you with a message like:
 461
 462 @example
 463 6d28aebbaaab1be9961a00bf15a1ef93acb91e30 is the first bad commit
 464 %%% ... blah blah blah ...
 465 @end example
 466
 467 If there is still a range of commits, then git will automatically
 468 select a new version for you to test.  Go to step #1.
 469
 470 @end enumerate
 471
 472 @subheading Recommendation: use two terminal windows
 473
 474 @itemize
 475 @item
 476 One window is open to the @code{build/} directory, and alternates
 477 between these commands:
 478
 479 @example
 480 make
 481 out/bin/lilypond test.ly
 482 @end example
 483
 484 @item
 485 One window is open to the top source directory, and alternates
 486 between these commands:
 487
 488 @example
 489 git bisect good
 490 git bisect bad
 491 @end example
 492
 493 @end itemize
 494
 495
 496 @node Memory and coverage tests
 497 @section Memory and coverage tests
 498
 499 In addition to the graphical output of the regression tests, it is
 500 possible to test memory usage and to determine how much of the source
 501 code has been exercised by the tests.
 502
 503 @subheading Memory usage
 504
 505 For tracking memory usage as part of this test, you will need
 506 GUILE CVS; especially the following patch:
 507 @smallexample
 508 @uref{http://lilypond.org/vc/old/gub.darcs/patches/guile-1.9-gcstats.patch}.
 509 @end smallexample
 510
 511 @subheading Code coverage
 512
 513 For checking the coverage of the test suite, do the following
 514
 515 @example
 516 ./scripts/auxiliar/build-coverage.sh
 517 @emph{# uncovered files, least covered first}
 518 ./scripts/auxiliar/coverage.py  --summary out-cov/*.cc
 519 @emph{# consecutive uncovered lines, longest first}
 520 ./scripts/auxiliar/coverage.py  --uncovered out-cov/*.cc
 521 @end example
 522
 523
 524 @node MusicXML tests
 525 @section MusicXML tests
 526
 527
 528 LilyPond comes with a complete set of regtests for the
 529 @uref{http://www.musicxml.org/,MusicXML} language.  Originally
 530 developed to test @samp{musicxml2ly}, these regression tests
 531 can be used to test any MusicXML implementation.
 532
 533 The MusicXML regression tests are found at
 534 @file{input/regression/musicxml/}.
 535
 536 The output resulting from running these tests
 537 through @samp{musicxml2ly} followed by @samp{lilypond} is
 538 available in the LilyPond documentation:
 539
 540 @example
 541 @uref{http://lilypond.org/doc/latest/input/regression/musicxml/collated-files}
 542 @end example
 543