Documentation/contributor/regressions.itexi

   1 @c -*- coding: utf-8; mode: texinfo; -*-
   2 @node Regression tests
   3 @chapter Regression tests
   4
   5 @menu
   6 * Introduction to regression tests::
   7 * Precompiled regression tests::
   8 * Compiling regression tests::
   9 * Regtest comparison::
  10 * Finding the cause of a regression::
  11 * Memory and coverage tests::
  12 * MusicXML tests::
  13 * Grand Regression Test Checking::
  14 @end menu
  15
  16
  17 @node Introduction to regression tests
  18 @section Introduction to regression tests
  19
  20 LilyPond has a complete suite of regression tests that are used
  21 to ensure that changes to the code do not break existing behavior.
  22 These regression tests comprise small LilyPond snippets that test
  23 the functionality of each part of LilyPond.
  24
  25 Regression tests are added when new functionality is added to
  26 LilyPond.
  27 We do not yet have a policy on when it is appropriate to add or
  28 modify a regtest when bugs are fixed.  Individual developers
  29 should use their best judgement until this is clarified during the
  30 @ref{Grand Organization Project (GOP)}.
  31
  32 The regression tests are compiled using special @code{make}
  33 targets.  There are three primary uses for the regression
  34 tests.  First, successful completion of the regression tests means
  35 that LilyPond has been properly built.  Second, the output of the
  36 regression tests can be manually checked to ensure that
  37 the graphical output matches the description of the intended
  38 output.  Third, the regression test output from two different
  39 versions of LilyPond can be automatically compared to identify
  40 any differences.  These differences should then be manually
  41 checked to ensure that the differences are intended.
  42
  43 Regression tests (@qq{regtests}) are available in precompiled form
  44 as part of the documentation.  Regtests can also be compiled
  45 on any machine that has a properly configured LilyPond build
  46 system.
  47
  48
  49 @node Precompiled regression tests
  50 @section Precompiled regression tests
  51
  52 @subheading Regression test output
  53
  54 As part of the release process, the regression tests are run
  55 for every LilyPond release.  Full regression test output is
  56 available for every stable version and the most recent development
  57 version.
  58
  59 Regression test output is available in HTML and PDF format.  Links
  60 to the regression test output are available at the developer's
  61 resources page for the version of interest.
  62
  63 The latest stable version of the regtests is found at:
  64
  65 @example
  66 @uref{http://lilypond.org/doc/stable/input/regression/collated-files.html}
  67 @end example
  68
  69 The latest development version of the regtests is found at:
  70
  71 @example
  72 @uref{http://lilypond.org/doc/latest/input/regression/collated-files.html}
  73 @end example
  74
  75
  76 @subheading Regression test comparison
  77
  78 Each time a new version is released, the regtests are
  79 compiled and the output is automatically compared with the
  80 output of the previous release.  The result of these
  81 comparisons is archived online:
  82
  83 @example
  84 @uref{http://lilypond.org/test/}
  85 @end example
  86
  87 Checking these pages is a very important task for the LilyPond project.
  88 You are invited to report anything that looks broken, or any case
  89 where the output quality is not on par with the previous release,
  90 as described in @rweb{Bug reports}.
  91
  92 @warning{ The special regression test
  93 @file{test-output-distance.ly} will always show up as a
  94 regression.  This test changes each time it is run, and serves to
  95 verify that the regression tests have, in fact, run.}
  96
  97
  98 @subheading What to look for
  99
 100 The test comparison shows all of the changes that occurred between
 101 the current release and the prior release.  Each test that has a
 102 significant difference in output is displayed, with the old
 103 version on the left and the new version on the right.
 104
 105 Regression tests whose output is the same for both versions are
 106 not shown in the test comparison.
 107
 108 @itemize
 109 @item
 110 Images: green blurs in the new version show the approximate
 111 location of elements in the old version.
 112
 113 There are often minor adjustments in spacing which do not indicate
 114 any problem.
 115
 116 @item
 117 Log files: show the difference in command-line output.
 118
 119 The main thing to examine are any changes in page counts -- if a
 120 file used to fit on 1 page but now requires 4 or 5 pages,
 121 something is suspicious!
 122
 123 @item
 124 Profile files: give information about
 125 TODO?  I don't know what they're for.
 126
 127 @end itemize
 128
 129 @warning{
 130 The automatic comparison of the regtests checks the LilyPond
 131 bounding boxes.  This means that Ghostscript changes and changes
 132 in lyrics or text are not found.
 133 }
 134
 135 @node Compiling regression tests
 136 @section Compiling regression tests
 137
 138 Developers may wish to see the output of the complete regression
 139 test suite for the current version of the source repository
 140 between releases.  Current source code is available; see
 141 @ref{Working with source code}.
 142
 143 For regression testing @code{../configure} should be run with the
 144 @code{--disable-optimising} option.  Then you will need
 145 to build the LilyPond binary; see @ref{Compiling LilyPond}.
 146
 147 Uninstalling the previous LilyPond version is not necessary, nor is
 148 running @code{make install}, since the tests will automatically be
 149 compiled with the LilyPond binary you have just built in your source
 150 directory.
 151
 152 From this point, the regtests are compiled with:
 153
 154 @example
 155 make test
 156 @end example
 157
 158 If you have a multi-core machine you may want to use the @option{-j}
 159 option and @var{CPU_COUNT} variable, as
 160 described in @ref{Saving time with CPU_COUNT}.
 161 For a quad-core processor the complete command would be:
 162
 163 @example
 164 make -j5 CPU_COUNT=5 test
 165 @end example
 166
 167 The regtest output will then be available in
 168 @file{input/regression/out-test}.
 169 @file{input/regression/out-test/collated-examples.html}
 170 contains a listing of all the regression tests that were run,
 171 but none of the images are included.  Individual images are
 172 also available in this directory.
 173
 174 The primary use of @samp{make@tie{}test} is to verify that the
 175 regression tests all run without error.  The regression test
 176 page that is part of the documentation is created only when the
 177 documentation is built, as described in @ref{Generating documentation}.
 178 Note that building the documentation requires more installed components
 179 than building the source code, as described in
 180 @ref{Requirements for building documentation}.
 181
 182
 183 @node Regtest comparison
 184 @section Regtest comparison
 185
 186 Before modified code is committed to master, a regression test
 187 comparison must be completed to ensure that the changes have
 188 not caused problems with previously working code.  The comparison
 189 is made automatically upon compiling the regression test suite
 190 twice.
 191
 192 @enumerate
 193
 194 @item
 195 Run @code{make} with current git master without any of your changes.
 196
 197 @item
 198 Before making changes to the code, establish a baseline for the comparison by
 199 going to the @file{lilypond-git/build/} directory and running:
 200
 201 @example
 202 make test-baseline
 203 @end example
 204
 205 @item
 206 Make your changes, or apply the patch(es) to consider.
 207
 208 @item
 209 Compile the source with @samp{make} as usual.
 210
 211 @item
 212 Check for unintentional changes to the regtests:
 213
 214 @example
 215 make check
 216 @end example
 217
 218 After this has finished, a regression test comparison will be
 219 available (relative to the current @file{build/} directory) at:
 220
 221 @example
 222 out/test-results/index.html
 223 @end example
 224
 225 For each regression test that differs between the baseline and the
 226 changed code, a regression test entry will be displayed.  Ideally,
 227 the only changes would be the changes that you were working on.
 228 If regressions are introduced, they must be fixed before
 229 committing the code.
 230
 231 @warning{
 232 The special regression test @file{test-output-distance.ly} will always
 233 show up as a regression.  This test changes each time it is run, and
 234 serves to verify that the regression tests have, in fact, run.}
 235
 236 @item
 237 If you are happy with the results, then stop now.
 238
 239 If you want to continue programming, then make any additional code
 240 changes, and continue.
 241
 242 @item
 243 Compile the source with @samp{make} as usual.
 244
 245 @item
 246 To re-check files that differed between the initial
 247 @samp{make@tie{}test-baseline} and your post-changes
 248 @samp{make@tie{}check}, run:
 249
 250 @example
 251 make test-redo
 252 @end example
 253
 254 This updates the regression list at @file{out/test-results/index.html}.
 255 It does @emph{not} redo @file{test-output-distance.ly}.
 256
 257 @item
 258 When all regressions have been resolved, the output list will be empty.
 259
 260 @item
 261 Once all regressions have been resolved, a final check should be completed
 262 by running:
 263
 264 @example
 265 make test-clean
 266 make check
 267 @end example
 268
 269 This cleans the results of the previous @samp{make@tie{}check}, then does the
 270 automatic regression comparison again.
 271
 272 @end enumerate
 273
 274 @advanced{
 275 Once a test baseline has been established, there is no need to run it again
 276 unless git master changed. In other words, if you work with several branches
 277 and want to do regtests comparison for all of them, you can
 278 @code{make test-baseline} with git master, checkout some branch,
 279 @code{make} and @code{make check} it, then switch to another branch,
 280 @code{make test-clean}, @code{make} and @code{make check} it without doing
 281 @code{make test-baseline} again.}
 282
 283
 284 @node Finding the cause of a regression
 285 @section Finding the cause of a regression
 286
 287 Git has special functionality to help tracking down the exact
 288 commit which causes a problem.  See the git manual page for
 289 @code{git bisect}.  This is a job that non-programmers can do,
 290 although it requires familiarity with git, ability to compile
 291 LilyPond, and generally a fair amount of technical knowledge.  A
 292 brief summary is given below, but you may need to consult other
 293 documentation for in-depth explanations.
 294
 295 Even if you are not familiar with git or are not able to compile
 296 LilyPond you can still help to narrow down the cause of a
 297 regression simply by downloading the binary releases of different
 298 LilyPond versions and testing them for the regression.  Knowing
 299 which version of LilyPond first exhibited the regression is
 300 helpful to a developer as it shortens the @code{git bisect}
 301 procedure.
 302
 303 Once a problematic commit is identified, the programmers' job is
 304 much easier.  In fact, for most regression bugs, the majority of
 305 the time is spent simply finding the problematic commit.
 306
 307 More information is in @ref{Regression tests}.
 308
 309 @subheading git bisect setup
 310
 311 We need to set up the bisect for each problem we want to
 312 investigate.
 313
 314 Suppose we have an input file which compiled in version 2.13.32,
 315 but fails in version 2.13.38 and above.
 316
 317 @enumerate
 318 @item
 319 Begin the process:
 320
 321 @example
 322 git bisect start
 323 @end example
 324
 325 @item
 326 Give it the earliest known bad tag:
 327
 328 @example
 329 git bisect bad release/2.13.38-1
 330 @end example
 331
 332 (you can see tags with: @code{git tag} )
 333
 334 @item
 335 Give it the latest known good tag:
 336
 337 @example
 338 git bisect good release/2.13.32-1
 339 @end example
 340
 341 You should now see something like:
 342 @example
 343 Bisecting: 195 revisions left to test after this (roughly 8 steps)
 344 [b17e2f3d7a5853a30f7d5a3cdc6b5079e77a3d2a] Web: Announcement
 345 update for the new @qq{LilyPond Report}.
 346 @end example
 347
 348 @end enumerate
 349
 350 @subheading git bisect actual
 351
 352 @enumerate
 353
 354 @item
 355 Compile the source:
 356
 357 @example
 358 make
 359 @end example
 360
 361 @item
 362 Test your input file:
 363
 364 @example
 365 out/bin/lilypond test.ly
 366 @end example
 367
 368 @item
 369 Test results?
 370
 371 @itemize
 372 @item
 373 Does it crash, or is the output bad?  If so:
 374
 375 @example
 376 git bisect bad
 377 @end example
 378
 379 @item
 380 Does your input file produce good output?  If so:
 381
 382 @example
 383 git bisect good
 384 @end example
 385
 386 @end itemize
 387
 388 @item
 389 Once the exact problem commit has been identified, git will inform
 390 you with a message like:
 391
 392 @example
 393 6d28aebbaaab1be9961a00bf15a1ef93acb91e30 is the first bad commit
 394 %%% ... blah blah blah ...
 395 @end example
 396
 397 If there is still a range of commits, then git will automatically
 398 select a new version for you to test.  Go to step #1.
 399
 400 @end enumerate
 401
 402 @subheading Recommendation: use two terminal windows
 403
 404 @itemize
 405 @item
 406 One window is open to the @code{build/} directory, and alternates
 407 between these commands:
 408
 409 @example
 410 make
 411 out/bin/lilypond test.ly
 412 @end example
 413
 414 @item
 415 One window is open to the top source directory, and alternates
 416 between these commands:
 417
 418 @example
 419 git bisect good
 420 git bisect bad
 421 @end example
 422
 423 @end itemize
 424
 425
 426 @node Memory and coverage tests
 427 @section Memory and coverage tests
 428
 429 In addition to the graphical output of the regression tests, it is
 430 possible to test memory usage and to determine how much of the source
 431 code has been exercised by the tests.
 432
 433 @subheading Memory usage
 434
 435 For tracking memory usage as part of this test, you will need
 436 GUILE CVS; especially the following patch:
 437 @smallexample
 438 @uref{http://www.lilypond.org/vc/old/gub.darcs/patches/guile-1.9-gcstats.patch}.
 439 @end smallexample
 440
 441 @subheading Code coverage
 442
 443 For checking the coverage of the test suite, do the following
 444
 445 @example
 446 ./scripts/auxiliar/build-coverage.sh
 447 @emph{# uncovered files, least covered first}
 448 ./scripts/auxiliar/coverage.py  --summary out-cov/*.cc
 449 @emph{# consecutive uncovered lines, longest first}
 450 ./scripts/auxiliar/coverage.py  --uncovered out-cov/*.cc
 451 @end example
 452
 453
 454 @node MusicXML tests
 455 @section MusicXML tests
 456
 457
 458 LilyPond comes with a complete set of regtests for the
 459 @uref{http://www.musicxml.org/,MusicXML} language.  Originally
 460 developed to test @samp{musicxml2ly}, these regression tests
 461 can be used to test any MusicXML implementation.
 462
 463 The MusicXML regression tests are found at
 464 @file{input/regression/musicxml/}.
 465
 466 The output resulting from running these tests
 467 through @samp{musicxml2ly} followed by @samp{lilypond} is
 468 available in the LilyPond documentation:
 469
 470 @example
 471 @uref{http://lilypond.org/doc/latest/input/regression/musicxml/collated-files}
 472 @end example
 473
 474
 475 @node Grand Regression Test Checking
 476 @section Grand Regression Test Checking
 477
 478 @subheading What is this all about?
 479
 480 Regression tests (usually abbreviated "regtests") is a collection
 481 of @file{.ly} files used to check whether LilyPond is working correctly.
 482 Example: before version 2.15.12 breve noteheads had incorrect width,
 483 which resulted in collisions with other objects.  After the issue was fixed,
 484 a small @file{.ly} file demonstrating the problem was added to the regression
 485 tests as a proof that the fix works.  If someone will accidentally break
 486 breve width again, we will notice this in the output of that regression test.
 487
 488 We are asking you to help us by checking a regtest or two from time to time.
 489 You don't need programming skills to do this, not even LilyPond skills -
 490 just basic music notation knowledge; checking one regtest takes less than
 491 a minute.  Simply go here:
 492
 493 @example
 494 @uref{http://www.philholmes.net/lilypond/regtests/}
 495 @end example
 496
 497 @subheading Some tips on checking regtests
 498
 499 @subsubheading Description text
 500
 501 The description should be clear even for a music beginner.
 502 If there are any special terms used in the description,
 503 they all should be explained in our @rglosnamed{Top, Music Glossary}
 504 or @rinternalsnamed{Top, Internals Reference}.
 505 Vague descriptions (like "behaves well", "looks reasonable") shouldn't be used.
 506
 507 @ignore
 508 @subsubheading Is regtest straightforward and systematic?
 509
 510 Unfortunately some regtests are written poorly.  A good regtest should be
 511 straightforward: it should be obvious what it checks and how.  Also, it
 512 usually shouldn't check everything at once.  For example it's a bad idea to test
 513 accidental placement by constucting one huge chord with many suspended notes
 514 and loads of accidentals.  It's better to divide such problem into a series
 515 of clearly separated cases.
 516 @end ignore