Documentation/contributor/regressions.itexi

   1 @c -*- coding: utf-8; mode: texinfo; -*-
   2 @node Regression tests
   3 @chapter Regression tests
   4
   5 @menu
   6 * Introduction to regression tests::
   7 * Precompiled regression tests::
   8 * Compiling regression tests::
   9 * Regtest comparison::
  10 * Finding the cause of a regression::
  11 * Memory and coverage tests::
  12 * MusicXML tests::
  13 * Grand Regression Test Checking::
  14 @end menu
  15
  16
  17 @node Introduction to regression tests
  18 @section Introduction to regression tests
  19
  20 LilyPond has a complete suite of regression tests that are used
  21 to ensure that changes to the code do not break existing behavior.
  22 These regression tests comprise small LilyPond snippets that test
  23 the functionality of each part of LilyPond.
  24
  25 Regression tests are added when new functionality is added to
  26 LilyPond.
  27 We do not yet have a policy on when it is appropriate to add or
  28 modify a regtest when bugs are fixed.  Individual developers
  29 should use their best judgement until this is clarified during the
  30 @ref{Grand Organization Project (GOP)}.
  31
  32 The regression tests are compiled using special @code{make}
  33 targets.  There are three primary uses for the regression
  34 tests.  First, successful completion of the regression tests means
  35 that LilyPond has been properly built.  Second, the output of the
  36 regression tests can be manually checked to ensure that
  37 the graphical output matches the description of the intended
  38 output.  Third, the regression test output from two different
  39 versions of LilyPond can be automatically compared to identify
  40 any differences.  These differences should then be manually
  41 checked to ensure that the differences are intended.
  42
  43 Regression tests (@qq{regtests}) are available in precompiled form
  44 as part of the documentation.  Regtests can also be compiled
  45 on any machine that has a properly configured LilyPond build
  46 system.
  47
  48
  49 @node Precompiled regression tests
  50 @section Precompiled regression tests
  51
  52 @subheading Regression test output
  53
  54 As part of the release process, the regression tests are run
  55 for every LilyPond release.  Full regression test output is
  56 available for every stable version and the most recent development
  57 version.
  58
  59 Regression test output is available in HTML and PDF format.  Links
  60 to the regression test output are available at the developer's
  61 resources page for the version of interest.
  62
  63 The latest stable version of the regtests is found at:
  64
  65 @example
  66 @uref{http://lilypond.org/doc/stable/input/regression/collated-files.html}
  67 @end example
  68
  69 The latest development version of the regtests is found at:
  70
  71 @example
  72 @uref{http://lilypond.org/doc/latest/input/regression/collated-files.html}
  73 @end example
  74
  75
  76 @subheading Regression test comparison
  77
  78 Each time a new version is released, the regtests are
  79 compiled and the output is automatically compared with the
  80 output of the previous release.  The result of these
  81 comparisons is archived online:
  82
  83 @example
  84 @uref{http://lilypond.org/test/}
  85 @end example
  86
  87 Checking these pages is a very important task for the LilyPond project.
  88 You are invited to report anything that looks broken, or any case
  89 where the output quality is not on par with the previous release,
  90 as described in @rweb{Bug reports}.
  91
  92 @warning{ The special regression test
  93 @file{test-output-distance.ly} will always show up as a
  94 regression.  This test changes each time it is run, and serves to
  95 verify that the regression tests have, in fact, run.}
  96
  97
  98 @subheading What to look for
  99
 100 The test comparison shows all of the changes that occurred between
 101 the current release and the prior release.  Each test that has a
 102 significant (noticeable) difference in output is displayed, with
 103 the old version on the left and the new version on the right.
 104
 105 Some of the small changes can be ignored (slightly different slur
 106 shapes, small variations in note spacing), but this is not always
 107 the case: sometimes even the smallest change means that something
 108 is wrong.  To help in distinguishing these cases, we use bigger
 109 staff size when small differences matter.
 110
 111 Staff size 30 generally means "pay extra attention to details".
 112 Staff size 40 (two times bigger than default size) or more means
 113 that the regtest @strong{is} about the details.
 114
 115 Staff size smaller than default doesn't mean anything.
 116
 117 Regression tests whose output is the same for both versions are
 118 not shown in the test comparison.
 119
 120 @itemize
 121 @item
 122 Images: green blurs in the new version show the approximate
 123 location of elements in the old version.
 124
 125 There are often minor adjustments in spacing which do not indicate
 126 any problem.
 127
 128 @item
 129 Log files: show the difference in command-line output.
 130
 131 The main thing to examine are any changes in page counts -- if a
 132 file used to fit on 1 page but now requires 4 or 5 pages,
 133 something is suspicious!
 134
 135 @item
 136 Profile files: give information about
 137 TODO?  I don't know what they're for.
 138 Apparently they give some information about CPU usage.  If you got
 139 tons of changes in cell counts, this probably means that you compiled
 140 @code{make test-baseline} with a different amount of CPU threads than
 141 @code{make check}. Try redoing tests from scratch with the same
 142 number of threads each time -- see @ref{Saving time with the -j option}.
 143
 144 @end itemize
 145
 146 @warning{
 147 The automatic comparison of the regtests checks the LilyPond
 148 bounding boxes.  This means that Ghostscript changes and changes
 149 in lyrics or text are not found.
 150 }
 151
 152 @node Compiling regression tests
 153 @section Compiling regression tests
 154
 155 Developers may wish to see the output of the complete regression
 156 test suite for the current version of the source repository
 157 between releases.  Current source code is available; see
 158 @ref{Working with source code}.
 159
 160 For regression testing @code{../configure} should be run with the
 161 @code{--disable-optimising} option.  Then you will need
 162 to build the LilyPond binary; see @ref{Compiling LilyPond}.
 163
 164 Uninstalling the previous LilyPond version is not necessary, nor is
 165 running @code{make install}, since the tests will automatically be
 166 compiled with the LilyPond binary you have just built in your source
 167 directory.
 168
 169 From this point, the regtests are compiled with:
 170
 171 @example
 172 make test
 173 @end example
 174
 175 If you have a multi-core machine you may want to use the @option{-j}
 176 option and @var{CPU_COUNT} variable, as
 177 described in @ref{Saving time with CPU_COUNT}.
 178 For a quad-core processor the complete command would be:
 179
 180 @example
 181 make -j5 CPU_COUNT=5 test
 182 @end example
 183
 184 The regtest output will then be available in
 185 @file{input/regression/out-test}.
 186 @file{input/regression/out-test/collated-examples.html}
 187 contains a listing of all the regression tests that were run,
 188 but none of the images are included.  Individual images are
 189 also available in this directory.
 190
 191 The primary use of @samp{make@tie{}test} is to verify that the
 192 regression tests all run without error.  The regression test
 193 page that is part of the documentation is created only when the
 194 documentation is built, as described in @ref{Generating documentation}.
 195 Note that building the documentation requires more installed components
 196 than building the source code, as described in
 197 @ref{Requirements for building documentation}.
 198
 199
 200 @node Regtest comparison
 201 @section Regtest comparison
 202
 203 Before modified code is committed to @code{master} (via @code{staging}),
 204 a regression test
 205 comparison must be completed to ensure that the changes have
 206 not caused problems with previously working code.  The comparison
 207 is made automatically upon compiling the regression test suite
 208 twice.
 209
 210 @enumerate
 211
 212 @item
 213 Run @code{make} with current git master without any of your changes.
 214
 215 @item
 216 Before making changes to the code, establish a baseline for the comparison by
 217 going to the @file{$LILYPOND_GIT/build/} directory and running:
 218
 219 @example
 220 make test-baseline
 221 @end example
 222
 223 @item
 224 Make your changes, or apply the patch(es) to consider.
 225
 226 @item
 227 Compile the source with @samp{make} as usual.
 228
 229 @item
 230 Check for unintentional changes to the regtests:
 231
 232 @example
 233 make check
 234 @end example
 235
 236 After this has finished, a regression test comparison will be
 237 available (relative to the current @file{build/} directory) at:
 238
 239 @example
 240 out/test-results/index.html
 241 @end example
 242
 243 For each regression test that differs between the baseline and the
 244 changed code, a regression test entry will be displayed.  Ideally,
 245 the only changes would be the changes that you were working on.
 246 If regressions are introduced, they must be fixed before
 247 committing the code.
 248
 249 @warning{
 250 The special regression test @file{test-output-distance.ly} will always
 251 show up as a regression.  This test changes each time it is run, and
 252 serves to verify that the regression tests have, in fact, run.}
 253
 254 @item
 255 If you are happy with the results, then stop now.
 256
 257 If you want to continue programming, then make any additional code
 258 changes, and continue.
 259
 260 @item
 261 Compile the source with @samp{make} as usual.
 262
 263 @item
 264 To re-check files that differed between the initial
 265 @samp{make@tie{}test-baseline} and your post-changes
 266 @samp{make@tie{}check}, run:
 267
 268 @example
 269 make test-redo
 270 @end example
 271
 272 This updates the regression list at @file{out/test-results/index.html}.
 273 It does @emph{not} redo @file{test-output-distance.ly}.
 274
 275 @item
 276 When all regressions have been resolved, the output list will be empty.
 277
 278 @item
 279 Once all regressions have been resolved, a final check should be completed
 280 by running:
 281
 282 @example
 283 make test-clean
 284 make check
 285 @end example
 286
 287 This cleans the results of the previous @samp{make@tie{}check}, then does the
 288 automatic regression comparison again.
 289
 290 @end enumerate
 291
 292 @advanced{
 293 Once a test baseline has been established, there is no need to run it again
 294 unless git master changed. In other words, if you work with several branches
 295 and want to do regtests comparison for all of them, you can
 296 @code{make test-baseline} with git master, checkout some branch,
 297 @code{make} and @code{make check} it, then switch to another branch,
 298 @code{make test-clean}, @code{make} and @code{make check} it without doing
 299 @code{make test-baseline} again.}
 300
 301
 302 @node Finding the cause of a regression
 303 @section Finding the cause of a regression
 304
 305 Git has special functionality to help tracking down the exact
 306 commit which causes a problem.  See the git manual page for
 307 @code{git bisect}.  This is a job that non-programmers can do,
 308 although it requires familiarity with git, ability to compile
 309 LilyPond, and generally a fair amount of technical knowledge.  A
 310 brief summary is given below, but you may need to consult other
 311 documentation for in-depth explanations.
 312
 313 Even if you are not familiar with git or are not able to compile
 314 LilyPond you can still help to narrow down the cause of a
 315 regression simply by downloading the binary releases of different
 316 LilyPond versions and testing them for the regression.  Knowing
 317 which version of LilyPond first exhibited the regression is
 318 helpful to a developer as it shortens the @code{git bisect}
 319 procedure.
 320
 321 Once a problematic commit is identified, the programmers' job is
 322 much easier.  In fact, for most regression bugs, the majority of
 323 the time is spent simply finding the problematic commit.
 324
 325 More information is in @ref{Regression tests}.
 326
 327 @subheading git bisect setup
 328
 329 We need to set up the bisect for each problem we want to
 330 investigate.
 331
 332 Suppose we have an input file which compiled in version 2.13.32,
 333 but fails in version 2.13.38 and above.
 334
 335 @enumerate
 336 @item
 337 Begin the process:
 338
 339 @example
 340 git bisect start
 341 @end example
 342
 343 @item
 344 Give it the earliest known bad tag:
 345
 346 @example
 347 git bisect bad release/2.13.38-1
 348 @end example
 349
 350 (you can see tags with: @code{git tag} )
 351
 352 @item
 353 Give it the latest known good tag:
 354
 355 @example
 356 git bisect good release/2.13.32-1
 357 @end example
 358
 359 You should now see something like:
 360 @example
 361 Bisecting: 195 revisions left to test after this (roughly 8 steps)
 362 [b17e2f3d7a5853a30f7d5a3cdc6b5079e77a3d2a] Web: Announcement
 363 update for the new @qq{LilyPond Report}.
 364 @end example
 365
 366 @end enumerate
 367
 368 @subheading git bisect actual
 369
 370 @enumerate
 371
 372 @item
 373 Compile the source:
 374
 375 @example
 376 make
 377 @end example
 378
 379 @item
 380 Test your input file:
 381
 382 @example
 383 out/bin/lilypond test.ly
 384 @end example
 385
 386 @item
 387 Test results?
 388
 389 @itemize
 390 @item
 391 Does it crash, or is the output bad?  If so:
 392
 393 @example
 394 git bisect bad
 395 @end example
 396
 397 @item
 398 Does your input file produce good output?  If so:
 399
 400 @example
 401 git bisect good
 402 @end example
 403
 404 @end itemize
 405
 406 @item
 407 Once the exact problem commit has been identified, git will inform
 408 you with a message like:
 409
 410 @example
 411 6d28aebbaaab1be9961a00bf15a1ef93acb91e30 is the first bad commit
 412 %%% ... blah blah blah ...
 413 @end example
 414
 415 If there is still a range of commits, then git will automatically
 416 select a new version for you to test.  Go to step #1.
 417
 418 @end enumerate
 419
 420 @subheading Recommendation: use two terminal windows
 421
 422 @itemize
 423 @item
 424 One window is open to the @code{build/} directory, and alternates
 425 between these commands:
 426
 427 @example
 428 make
 429 out/bin/lilypond test.ly
 430 @end example
 431
 432 @item
 433 One window is open to the top source directory, and alternates
 434 between these commands:
 435
 436 @example
 437 git bisect good
 438 git bisect bad
 439 @end example
 440
 441 @end itemize
 442
 443
 444 @node Memory and coverage tests
 445 @section Memory and coverage tests
 446
 447 In addition to the graphical output of the regression tests, it is
 448 possible to test memory usage and to determine how much of the source
 449 code has been exercised by the tests.
 450
 451 @subheading Memory usage
 452
 453 For tracking memory usage as part of this test, you will need
 454 GUILE CVS; especially the following patch:
 455 @smallexample
 456 @uref{http://www.lilypond.org/vc/old/gub.darcs/patches/guile-1.9-gcstats.patch}.
 457 @end smallexample
 458
 459 @subheading Code coverage
 460
 461 For checking the coverage of the test suite, do the following
 462
 463 @example
 464 ./scripts/auxiliar/build-coverage.sh
 465 @emph{# uncovered files, least covered first}
 466 ./scripts/auxiliar/coverage.py  --summary out-cov/*.cc
 467 @emph{# consecutive uncovered lines, longest first}
 468 ./scripts/auxiliar/coverage.py  --uncovered out-cov/*.cc
 469 @end example
 470
 471
 472 @node MusicXML tests
 473 @section MusicXML tests
 474
 475
 476 LilyPond comes with a complete set of regtests for the
 477 @uref{http://www.musicxml.org/,MusicXML} language.  Originally
 478 developed to test @samp{musicxml2ly}, these regression tests
 479 can be used to test any MusicXML implementation.
 480
 481 The MusicXML regression tests are found at
 482 @file{input/regression/musicxml/}.
 483
 484 The output resulting from running these tests
 485 through @samp{musicxml2ly} followed by @samp{lilypond} is
 486 available in the LilyPond documentation:
 487
 488 @example
 489 @uref{http://lilypond.org/doc/latest/input/regression/musicxml/collated-files}
 490 @end example
 491
 492
 493 @node Grand Regression Test Checking
 494 @section Grand Regression Test Checking
 495
 496 @subheading What is this all about?
 497
 498 Regression tests (usually abbreviated "regtests") is a collection
 499 of @file{.ly} files used to check whether LilyPond is working correctly.
 500 Example: before version 2.15.12 breve noteheads had incorrect width,
 501 which resulted in collisions with other objects.  After the issue was fixed,
 502 a small @file{.ly} file demonstrating the problem was added to the regression
 503 tests as a proof that the fix works.  If someone will accidentally break
 504 breve width again, we will notice this in the output of that regression test.
 505
 506 @subheading How can I help?
 507
 508 We ask you to help us by checking one or two regtests from time to time.
 509 You don't need programming skills to do this, not even LilyPond skills -
 510 just basic music notation knowledge; checking one regtest takes less than
 511 a minute.  Simply go here:
 512
 513 @example
 514 @uref{http://www.philholmes.net/lilypond/regtests/}
 515 @end example
 516
 517 @subheading Some tips on checking regtests
 518
 519 @subsubheading Description text
 520
 521 The description should be clear even for a music beginner.
 522 If there are any special terms used in the description,
 523 they all should be explained in our @rglosnamed{Top, Music Glossary}
 524 or @rinternalsnamed{Top, Internals Reference}.
 525 Vague descriptions (like "behaves well", "looks reasonable") shouldn't be used.
 526
 527 @ignore
 528 this may be useful for advanced regtest checking
 529 @subsubheading Is regtest straightforward and systematic?
 530
 531 Unfortunately some regtests are written poorly.  A good regtest should be
 532 straightforward: it should be obvious what it checks and how.  Also, it
 533 usually shouldn't check everything at once.  For example it's a bad idea to test
 534 accidental placement by constucting one huge chord with many suspended notes
 535 and loads of accidentals.  It's better to divide such problem into a series
 536 of clearly separated cases.
 537 @end ignore