Documentation/contributor/regressions.itexi

   1 @c -*- coding: utf-8; mode: texinfo; -*-
   2 @node Regression tests
   3 @chapter Regression tests
   4
   5 @menu
   6 * Introduction to regression tests::
   7 * Precompiled regression tests::
   8 * Compiling regression tests::
   9 * Regtest comparison::
  10 * Finding the cause of a regression::
  11 * Memory and coverage tests::
  12 * MusicXML tests::
  13 * Grand Regression Test Checking::
  14 @end menu
  15
  16
  17 @node Introduction to regression tests
  18 @section Introduction to regression tests
  19
  20 LilyPond has a complete suite of regression tests that are used
  21 to ensure that changes to the code do not break existing behavior.
  22 These regression tests comprise small LilyPond snippets that test
  23 the functionality of each part of LilyPond.
  24
  25 Regression tests are added when new functionality is added to
  26 LilyPond.
  27 We do not yet have a policy on when it is appropriate to add or
  28 modify a regtest when bugs are fixed.  Individual developers
  29 should use their best judgement until this is clarified during the
  30 @ref{Grand Organization Project (GOP)}.
  31
  32 The regression tests are compiled using special @code{make}
  33 targets.  There are three primary uses for the regression
  34 tests.  First, successful completion of the regression tests means
  35 that LilyPond has been properly built.  Second, the output of the
  36 regression tests can be manually checked to ensure that
  37 the graphical output matches the description of the intended
  38 output.  Third, the regression test output from two different
  39 versions of LilyPond can be automatically compared to identify
  40 any differences.  These differences should then be manually
  41 checked to ensure that the differences are intended.
  42
  43 Regression tests (@qq{regtests}) are available in precompiled form
  44 as part of the documentation.  Regtests can also be compiled
  45 on any machine that has a properly configured LilyPond build
  46 system.
  47
  48
  49 @node Precompiled regression tests
  50 @section Precompiled regression tests
  51
  52 @subheading Regression test output
  53
  54 As part of the release process, the regression tests are run
  55 for every LilyPond release.  Full regression test output is
  56 available for every stable version and the most recent development
  57 version.
  58
  59 Regression test output is available in HTML and PDF format.  Links
  60 to the regression test output are available at the developer's
  61 resources page for the version of interest.
  62
  63 The latest stable version of the regtests is found at:
  64
  65 @example
  66 @uref{http://lilypond.org/doc/stable/input/regression/collated-files.html}
  67 @end example
  68
  69 The latest development version of the regtests is found at:
  70
  71 @example
  72 @uref{http://lilypond.org/doc/latest/input/regression/collated-files.html}
  73 @end example
  74
  75
  76 @subheading Regression test comparison
  77
  78 Each time a new version is released, the regtests are
  79 compiled and the output is automatically compared with the
  80 output of the previous release.  The result of these
  81 comparisons is archived online:
  82
  83 @example
  84 @uref{http://lilypond.org/test/}
  85 @end example
  86
  87 Checking these pages is a very important task for the LilyPond project.
  88 You are invited to report anything that looks broken, or any case
  89 where the output quality is not on par with the previous release,
  90 as described in @rweb{Bug reports}.
  91
  92 @warning{ The special regression test
  93 @file{test-output-distance.ly} will always show up as a
  94 regression.  This test changes each time it is run, and serves to
  95 verify that the regression tests have, in fact, run.}
  96
  97
  98 @subheading What to look for
  99
 100 The test comparison shows all of the changes that occurred between
 101 the current release and the prior release.  Each test that has a
 102 significant (noticeable) difference in output is displayed, with
 103 the old version on the left and the new version on the right.
 104
 105 Some of the small changes can be ignored (slightly different slur
 106 shapes, small variations in note spacing), but this is not always
 107 the case: sometimes even the smallest change means that something
 108 is wrong.  To help in distinguishing these cases, we use bigger
 109 staff size when small differences matter.
 110
 111 Staff size 30 generally means "pay extra attention to details".
 112 Staff size 40 (two times bigger than default size) or more means
 113 that the regtest @strong{is} about the details.
 114
 115 Staff size smaller than default doesn't mean anything.
 116
 117 Regression tests whose output is the same for both versions are
 118 not shown in the test comparison.
 119
 120 @itemize
 121 @item
 122 Images: green blurs in the new version show the approximate
 123 location of elements in the old version.
 124
 125 There are often minor adjustments in spacing which do not indicate
 126 any problem.
 127
 128 @item
 129 Log files: show the difference in command-line output.
 130
 131 The main thing to examine are any changes in page counts -- if a
 132 file used to fit on 1 page but now requires 4 or 5 pages,
 133 something is suspicious!
 134
 135 @item
 136 Profile files: give information about
 137 TODO?  I don't know what they're for.
 138
 139 @end itemize
 140
 141 @warning{
 142 The automatic comparison of the regtests checks the LilyPond
 143 bounding boxes.  This means that Ghostscript changes and changes
 144 in lyrics or text are not found.
 145 }
 146
 147 @node Compiling regression tests
 148 @section Compiling regression tests
 149
 150 Developers may wish to see the output of the complete regression
 151 test suite for the current version of the source repository
 152 between releases.  Current source code is available; see
 153 @ref{Working with source code}.
 154
 155 For regression testing @code{../configure} should be run with the
 156 @code{--disable-optimising} option.  Then you will need
 157 to build the LilyPond binary; see @ref{Compiling LilyPond}.
 158
 159 Uninstalling the previous LilyPond version is not necessary, nor is
 160 running @code{make install}, since the tests will automatically be
 161 compiled with the LilyPond binary you have just built in your source
 162 directory.
 163
 164 From this point, the regtests are compiled with:
 165
 166 @example
 167 make test
 168 @end example
 169
 170 If you have a multi-core machine you may want to use the @option{-j}
 171 option and @var{CPU_COUNT} variable, as
 172 described in @ref{Saving time with CPU_COUNT}.
 173 For a quad-core processor the complete command would be:
 174
 175 @example
 176 make -j5 CPU_COUNT=5 test
 177 @end example
 178
 179 The regtest output will then be available in
 180 @file{input/regression/out-test}.
 181 @file{input/regression/out-test/collated-examples.html}
 182 contains a listing of all the regression tests that were run,
 183 but none of the images are included.  Individual images are
 184 also available in this directory.
 185
 186 The primary use of @samp{make@tie{}test} is to verify that the
 187 regression tests all run without error.  The regression test
 188 page that is part of the documentation is created only when the
 189 documentation is built, as described in @ref{Generating documentation}.
 190 Note that building the documentation requires more installed components
 191 than building the source code, as described in
 192 @ref{Requirements for building documentation}.
 193
 194
 195 @node Regtest comparison
 196 @section Regtest comparison
 197
 198 Before modified code is committed to @code{master} (via @code{staging}),
 199 a regression test
 200 comparison must be completed to ensure that the changes have
 201 not caused problems with previously working code.  The comparison
 202 is made automatically upon compiling the regression test suite
 203 twice.
 204
 205 @enumerate
 206
 207 @item
 208 Run @code{make} with current git master without any of your changes.
 209
 210 @item
 211 Before making changes to the code, establish a baseline for the comparison by
 212 going to the @file{$LILYPOND_GIT/build/} directory and running:
 213
 214 @example
 215 make test-baseline
 216 @end example
 217
 218 @item
 219 Make your changes, or apply the patch(es) to consider.
 220
 221 @item
 222 Compile the source with @samp{make} as usual.
 223
 224 @item
 225 Check for unintentional changes to the regtests:
 226
 227 @example
 228 make check
 229 @end example
 230
 231 After this has finished, a regression test comparison will be
 232 available (relative to the current @file{build/} directory) at:
 233
 234 @example
 235 out/test-results/index.html
 236 @end example
 237
 238 For each regression test that differs between the baseline and the
 239 changed code, a regression test entry will be displayed.  Ideally,
 240 the only changes would be the changes that you were working on.
 241 If regressions are introduced, they must be fixed before
 242 committing the code.
 243
 244 @warning{
 245 The special regression test @file{test-output-distance.ly} will always
 246 show up as a regression.  This test changes each time it is run, and
 247 serves to verify that the regression tests have, in fact, run.}
 248
 249 @item
 250 If you are happy with the results, then stop now.
 251
 252 If you want to continue programming, then make any additional code
 253 changes, and continue.
 254
 255 @item
 256 Compile the source with @samp{make} as usual.
 257
 258 @item
 259 To re-check files that differed between the initial
 260 @samp{make@tie{}test-baseline} and your post-changes
 261 @samp{make@tie{}check}, run:
 262
 263 @example
 264 make test-redo
 265 @end example
 266
 267 This updates the regression list at @file{out/test-results/index.html}.
 268 It does @emph{not} redo @file{test-output-distance.ly}.
 269
 270 @item
 271 When all regressions have been resolved, the output list will be empty.
 272
 273 @item
 274 Once all regressions have been resolved, a final check should be completed
 275 by running:
 276
 277 @example
 278 make test-clean
 279 make check
 280 @end example
 281
 282 This cleans the results of the previous @samp{make@tie{}check}, then does the
 283 automatic regression comparison again.
 284
 285 @end enumerate
 286
 287 @advanced{
 288 Once a test baseline has been established, there is no need to run it again
 289 unless git master changed. In other words, if you work with several branches
 290 and want to do regtests comparison for all of them, you can
 291 @code{make test-baseline} with git master, checkout some branch,
 292 @code{make} and @code{make check} it, then switch to another branch,
 293 @code{make test-clean}, @code{make} and @code{make check} it without doing
 294 @code{make test-baseline} again.}
 295
 296
 297 @node Finding the cause of a regression
 298 @section Finding the cause of a regression
 299
 300 Git has special functionality to help tracking down the exact
 301 commit which causes a problem.  See the git manual page for
 302 @code{git bisect}.  This is a job that non-programmers can do,
 303 although it requires familiarity with git, ability to compile
 304 LilyPond, and generally a fair amount of technical knowledge.  A
 305 brief summary is given below, but you may need to consult other
 306 documentation for in-depth explanations.
 307
 308 Even if you are not familiar with git or are not able to compile
 309 LilyPond you can still help to narrow down the cause of a
 310 regression simply by downloading the binary releases of different
 311 LilyPond versions and testing them for the regression.  Knowing
 312 which version of LilyPond first exhibited the regression is
 313 helpful to a developer as it shortens the @code{git bisect}
 314 procedure.
 315
 316 Once a problematic commit is identified, the programmers' job is
 317 much easier.  In fact, for most regression bugs, the majority of
 318 the time is spent simply finding the problematic commit.
 319
 320 More information is in @ref{Regression tests}.
 321
 322 @subheading git bisect setup
 323
 324 We need to set up the bisect for each problem we want to
 325 investigate.
 326
 327 Suppose we have an input file which compiled in version 2.13.32,
 328 but fails in version 2.13.38 and above.
 329
 330 @enumerate
 331 @item
 332 Begin the process:
 333
 334 @example
 335 git bisect start
 336 @end example
 337
 338 @item
 339 Give it the earliest known bad tag:
 340
 341 @example
 342 git bisect bad release/2.13.38-1
 343 @end example
 344
 345 (you can see tags with: @code{git tag} )
 346
 347 @item
 348 Give it the latest known good tag:
 349
 350 @example
 351 git bisect good release/2.13.32-1
 352 @end example
 353
 354 You should now see something like:
 355 @example
 356 Bisecting: 195 revisions left to test after this (roughly 8 steps)
 357 [b17e2f3d7a5853a30f7d5a3cdc6b5079e77a3d2a] Web: Announcement
 358 update for the new @qq{LilyPond Report}.
 359 @end example
 360
 361 @end enumerate
 362
 363 @subheading git bisect actual
 364
 365 @enumerate
 366
 367 @item
 368 Compile the source:
 369
 370 @example
 371 make
 372 @end example
 373
 374 @item
 375 Test your input file:
 376
 377 @example
 378 out/bin/lilypond test.ly
 379 @end example
 380
 381 @item
 382 Test results?
 383
 384 @itemize
 385 @item
 386 Does it crash, or is the output bad?  If so:
 387
 388 @example
 389 git bisect bad
 390 @end example
 391
 392 @item
 393 Does your input file produce good output?  If so:
 394
 395 @example
 396 git bisect good
 397 @end example
 398
 399 @end itemize
 400
 401 @item
 402 Once the exact problem commit has been identified, git will inform
 403 you with a message like:
 404
 405 @example
 406 6d28aebbaaab1be9961a00bf15a1ef93acb91e30 is the first bad commit
 407 %%% ... blah blah blah ...
 408 @end example
 409
 410 If there is still a range of commits, then git will automatically
 411 select a new version for you to test.  Go to step #1.
 412
 413 @end enumerate
 414
 415 @subheading Recommendation: use two terminal windows
 416
 417 @itemize
 418 @item
 419 One window is open to the @code{build/} directory, and alternates
 420 between these commands:
 421
 422 @example
 423 make
 424 out/bin/lilypond test.ly
 425 @end example
 426
 427 @item
 428 One window is open to the top source directory, and alternates
 429 between these commands:
 430
 431 @example
 432 git bisect good
 433 git bisect bad
 434 @end example
 435
 436 @end itemize
 437
 438
 439 @node Memory and coverage tests
 440 @section Memory and coverage tests
 441
 442 In addition to the graphical output of the regression tests, it is
 443 possible to test memory usage and to determine how much of the source
 444 code has been exercised by the tests.
 445
 446 @subheading Memory usage
 447
 448 For tracking memory usage as part of this test, you will need
 449 GUILE CVS; especially the following patch:
 450 @smallexample
 451 @uref{http://www.lilypond.org/vc/old/gub.darcs/patches/guile-1.9-gcstats.patch}.
 452 @end smallexample
 453
 454 @subheading Code coverage
 455
 456 For checking the coverage of the test suite, do the following
 457
 458 @example
 459 ./scripts/auxiliar/build-coverage.sh
 460 @emph{# uncovered files, least covered first}
 461 ./scripts/auxiliar/coverage.py  --summary out-cov/*.cc
 462 @emph{# consecutive uncovered lines, longest first}
 463 ./scripts/auxiliar/coverage.py  --uncovered out-cov/*.cc
 464 @end example
 465
 466
 467 @node MusicXML tests
 468 @section MusicXML tests
 469
 470
 471 LilyPond comes with a complete set of regtests for the
 472 @uref{http://www.musicxml.org/,MusicXML} language.  Originally
 473 developed to test @samp{musicxml2ly}, these regression tests
 474 can be used to test any MusicXML implementation.
 475
 476 The MusicXML regression tests are found at
 477 @file{input/regression/musicxml/}.
 478
 479 The output resulting from running these tests
 480 through @samp{musicxml2ly} followed by @samp{lilypond} is
 481 available in the LilyPond documentation:
 482
 483 @example
 484 @uref{http://lilypond.org/doc/latest/input/regression/musicxml/collated-files}
 485 @end example
 486
 487
 488 @node Grand Regression Test Checking
 489 @section Grand Regression Test Checking
 490
 491 @subheading What is this all about?
 492
 493 Regression tests (usually abbreviated "regtests") is a collection
 494 of @file{.ly} files used to check whether LilyPond is working correctly.
 495 Example: before version 2.15.12 breve noteheads had incorrect width,
 496 which resulted in collisions with other objects.  After the issue was fixed,
 497 a small @file{.ly} file demonstrating the problem was added to the regression
 498 tests as a proof that the fix works.  If someone will accidentally break
 499 breve width again, we will notice this in the output of that regression test.
 500
 501 @subheading How can I help?
 502
 503 We ask you to help us by checking one or two regtests from time to time.
 504 You don't need programming skills to do this, not even LilyPond skills -
 505 just basic music notation knowledge; checking one regtest takes less than
 506 a minute.  Simply go here:
 507
 508 @example
 509 @uref{http://www.philholmes.net/lilypond/regtests/}
 510 @end example
 511
 512 @subheading Some tips on checking regtests
 513
 514 @subsubheading Description text
 515
 516 The description should be clear even for a music beginner.
 517 If there are any special terms used in the description,
 518 they all should be explained in our @rglosnamed{Top, Music Glossary}
 519 or @rinternalsnamed{Top, Internals Reference}.
 520 Vague descriptions (like "behaves well", "looks reasonable") shouldn't be used.
 521
 522 @ignore
 523 this may be useful for advanced regtest checking
 524 @subsubheading Is regtest straightforward and systematic?
 525
 526 Unfortunately some regtests are written poorly.  A good regtest should be
 527 straightforward: it should be obvious what it checks and how.  Also, it
 528 usually shouldn't check everything at once.  For example it's a bad idea to test
 529 accidental placement by constucting one huge chord with many suspended notes
 530 and loads of accidentals.  It's better to divide such problem into a series
 531 of clearly separated cases.
 532 @end ignore