Documentation/contributor/regressions.itexi

   1 @c -*- coding: utf-8; mode: texinfo; -*-
   2 @node Regression tests
   3 @chapter Regression tests
   4
   5 @menu
   6 * Introduction to regression tests::
   7 * Precompiled regression tests::
   8 * Compiling regression tests::
   9 * Regtest comparison::
  10 * Finding the cause of a regression::
  11 * Memory and coverage tests::
  12 * MusicXML tests::
  13 * Grand Regression Test Checking::
  14 @end menu
  15
  16
  17 @node Introduction to regression tests
  18 @section Introduction to regression tests
  19
  20 LilyPond has a complete suite of regression tests that are used
  21 to ensure that changes to the code do not break existing behavior.
  22 These regression tests comprise small LilyPond snippets that test
  23 the functionality of each part of LilyPond.
  24
  25 Regression tests are added when new functionality is added to
  26 LilyPond.
  27 We do not yet have a policy on when it is appropriate to add or
  28 modify a regtest when bugs are fixed.  Individual developers
  29 should use their best judgement until this is clarified during the
  30 @ref{Grand Organization Project (GOP)}.
  31
  32 The regression tests are compiled using special @code{make}
  33 targets.  There are three primary uses for the regression
  34 tests.  First, successful completion of the regression tests means
  35 that LilyPond has been properly built.  Second, the output of the
  36 regression tests can be manually checked to ensure that
  37 the graphical output matches the description of the intended
  38 output.  Third, the regression test output from two different
  39 versions of LilyPond can be automatically compared to identify
  40 any differences.  These differences should then be manually
  41 checked to ensure that the differences are intended.
  42
  43 Regression tests (@qq{regtests}) are available in precompiled form
  44 as part of the documentation.  Regtests can also be compiled
  45 on any machine that has a properly configured LilyPond build
  46 system.
  47
  48
  49 @node Precompiled regression tests
  50 @section Precompiled regression tests
  51
  52 @subheading Regression test output
  53
  54 As part of the release process, the regression tests are run
  55 for every LilyPond release.  Full regression test output is
  56 available for every stable version and the most recent development
  57 version.
  58
  59 Regression test output is available in HTML and PDF format.  Links
  60 to the regression test output are available at the developer's
  61 resources page for the version of interest.
  62
  63 The latest stable version of the regtests is found at:
  64
  65 @example
  66 @uref{http://lilypond.org/doc/stable/input/regression/collated-files.html}
  67 @end example
  68
  69 The latest development version of the regtests is found at:
  70
  71 @example
  72 @uref{http://lilypond.org/doc/latest/input/regression/collated-files.html}
  73 @end example
  74
  75
  76 @subheading Regression test comparison
  77
  78 Each time a new version is released, the regtests are
  79 compiled and the output is automatically compared with the
  80 output of the previous release.  The result of these
  81 comparisons is archived online:
  82
  83 @example
  84 @uref{http://lilypond.org/test/}
  85 @end example
  86
  87 Checking these pages is a very important task for the LilyPond project.
  88 You are invited to report anything that looks broken, or any case
  89 where the output quality is not on par with the previous release,
  90 as described in @rweb{Bug reports}.
  91
  92 @warning{ The special regression test
  93 @file{test-output-distance.ly} will always show up as a
  94 regression.  This test changes each time it is run, and serves to
  95 verify that the regression tests have, in fact, run.}
  96
  97
  98 @subheading What to look for
  99
 100 The test comparison shows all of the changes that occurred between
 101 the current release and the prior release.  Each test that has a
 102 significant difference in output is displayed, with the old
 103 version on the left and the new version on the right.
 104
 105 Regression tests whose output is the same for both versions are
 106 not shown in the test comparison.
 107
 108 @itemize
 109 @item
 110 Images: green blurs in the new version show the approximate
 111 location of elements in the old version.
 112
 113 There are often minor adjustments in spacing which do not indicate
 114 any problem.
 115
 116 @item
 117 Log files: show the difference in command-line output.
 118
 119 The main thing to examine are any changes in page counts -- if a
 120 file used to fit on 1 page but now requires 4 or 5 pages,
 121 something is suspicious!
 122
 123 @item
 124 Profile files: give information about
 125 TODO?  I don't know what they're for.
 126
 127 @end itemize
 128
 129 @warning{
 130 The automatic comparison of the regtests checks the LilyPond
 131 bounding boxes.  This means that Ghostscript changes and changes
 132 in lyrics or text are not found.
 133 }
 134
 135 @node Compiling regression tests
 136 @section Compiling regression tests
 137
 138 Developers may wish to see the output of the complete regression
 139 test suite for the current version of the source repository
 140 between releases.  Current source code is available; see
 141 @ref{Working with source code}.
 142
 143 For regression testing @code{../configure} should be run with the
 144 @code{--disable-optimising} option.  Then you will need
 145 to build the LilyPond binary; see @ref{Compiling LilyPond}.
 146
 147 Uninstalling the previous LilyPond version is not necessary, nor is
 148 running @code{make install}, since the tests will automatically be
 149 compiled with the LilyPond binary you have just built in your source
 150 directory.
 151
 152 From this point, the regtests are compiled with:
 153
 154 @example
 155 make test
 156 @end example
 157
 158 If you have a multi-core machine you may want to use the @option{-j}
 159 option and @var{CPU_COUNT} variable, as
 160 described in @ref{Saving time with CPU_COUNT}.
 161 For a quad-core processor the complete command would be:
 162
 163 @example
 164 make -j5 CPU_COUNT=5 test
 165 @end example
 166
 167 The regtest output will then be available in
 168 @file{input/regression/out-test}.
 169 @file{input/regression/out-test/collated-examples.html}
 170 contains a listing of all the regression tests that were run,
 171 but none of the images are included.  Individual images are
 172 also available in this directory.
 173
 174 The primary use of @samp{make@tie{}test} is to verify that the
 175 regression tests all run without error.  The regression test
 176 page that is part of the documentation is created only when the
 177 documentation is built, as described in @ref{Generating documentation}.
 178 Note that building the documentation requires more installed components
 179 than building the source code, as described in
 180 @ref{Requirements for building documentation}.
 181
 182
 183 @node Regtest comparison
 184 @section Regtest comparison
 185
 186 Before modified code is committed to @code{master} (via @code{staging}),
 187 a regression test
 188 comparison must be completed to ensure that the changes have
 189 not caused problems with previously working code.  The comparison
 190 is made automatically upon compiling the regression test suite
 191 twice.
 192
 193 @enumerate
 194
 195 @item
 196 Run @code{make} with current git master without any of your changes.
 197
 198 @item
 199 Before making changes to the code, establish a baseline for the comparison by
 200 going to the @file{lilypond-git/build/} directory and running:
 201
 202 @example
 203 make test-baseline
 204 @end example
 205
 206 @item
 207 Make your changes, or apply the patch(es) to consider.
 208
 209 @item
 210 Compile the source with @samp{make} as usual.
 211
 212 @item
 213 Check for unintentional changes to the regtests:
 214
 215 @example
 216 make check
 217 @end example
 218
 219 After this has finished, a regression test comparison will be
 220 available (relative to the current @file{build/} directory) at:
 221
 222 @example
 223 out/test-results/index.html
 224 @end example
 225
 226 For each regression test that differs between the baseline and the
 227 changed code, a regression test entry will be displayed.  Ideally,
 228 the only changes would be the changes that you were working on.
 229 If regressions are introduced, they must be fixed before
 230 committing the code.
 231
 232 @warning{
 233 The special regression test @file{test-output-distance.ly} will always
 234 show up as a regression.  This test changes each time it is run, and
 235 serves to verify that the regression tests have, in fact, run.}
 236
 237 @item
 238 If you are happy with the results, then stop now.
 239
 240 If you want to continue programming, then make any additional code
 241 changes, and continue.
 242
 243 @item
 244 Compile the source with @samp{make} as usual.
 245
 246 @item
 247 To re-check files that differed between the initial
 248 @samp{make@tie{}test-baseline} and your post-changes
 249 @samp{make@tie{}check}, run:
 250
 251 @example
 252 make test-redo
 253 @end example
 254
 255 This updates the regression list at @file{out/test-results/index.html}.
 256 It does @emph{not} redo @file{test-output-distance.ly}.
 257
 258 @item
 259 When all regressions have been resolved, the output list will be empty.
 260
 261 @item
 262 Once all regressions have been resolved, a final check should be completed
 263 by running:
 264
 265 @example
 266 make test-clean
 267 make check
 268 @end example
 269
 270 This cleans the results of the previous @samp{make@tie{}check}, then does the
 271 automatic regression comparison again.
 272
 273 @end enumerate
 274
 275 @advanced{
 276 Once a test baseline has been established, there is no need to run it again
 277 unless git master changed. In other words, if you work with several branches
 278 and want to do regtests comparison for all of them, you can
 279 @code{make test-baseline} with git master, checkout some branch,
 280 @code{make} and @code{make check} it, then switch to another branch,
 281 @code{make test-clean}, @code{make} and @code{make check} it without doing
 282 @code{make test-baseline} again.}
 283
 284
 285 @node Finding the cause of a regression
 286 @section Finding the cause of a regression
 287
 288 Git has special functionality to help tracking down the exact
 289 commit which causes a problem.  See the git manual page for
 290 @code{git bisect}.  This is a job that non-programmers can do,
 291 although it requires familiarity with git, ability to compile
 292 LilyPond, and generally a fair amount of technical knowledge.  A
 293 brief summary is given below, but you may need to consult other
 294 documentation for in-depth explanations.
 295
 296 Even if you are not familiar with git or are not able to compile
 297 LilyPond you can still help to narrow down the cause of a
 298 regression simply by downloading the binary releases of different
 299 LilyPond versions and testing them for the regression.  Knowing
 300 which version of LilyPond first exhibited the regression is
 301 helpful to a developer as it shortens the @code{git bisect}
 302 procedure.
 303
 304 Once a problematic commit is identified, the programmers' job is
 305 much easier.  In fact, for most regression bugs, the majority of
 306 the time is spent simply finding the problematic commit.
 307
 308 More information is in @ref{Regression tests}.
 309
 310 @subheading git bisect setup
 311
 312 We need to set up the bisect for each problem we want to
 313 investigate.
 314
 315 Suppose we have an input file which compiled in version 2.13.32,
 316 but fails in version 2.13.38 and above.
 317
 318 @enumerate
 319 @item
 320 Begin the process:
 321
 322 @example
 323 git bisect start
 324 @end example
 325
 326 @item
 327 Give it the earliest known bad tag:
 328
 329 @example
 330 git bisect bad release/2.13.38-1
 331 @end example
 332
 333 (you can see tags with: @code{git tag} )
 334
 335 @item
 336 Give it the latest known good tag:
 337
 338 @example
 339 git bisect good release/2.13.32-1
 340 @end example
 341
 342 You should now see something like:
 343 @example
 344 Bisecting: 195 revisions left to test after this (roughly 8 steps)
 345 [b17e2f3d7a5853a30f7d5a3cdc6b5079e77a3d2a] Web: Announcement
 346 update for the new @qq{LilyPond Report}.
 347 @end example
 348
 349 @end enumerate
 350
 351 @subheading git bisect actual
 352
 353 @enumerate
 354
 355 @item
 356 Compile the source:
 357
 358 @example
 359 make
 360 @end example
 361
 362 @item
 363 Test your input file:
 364
 365 @example
 366 out/bin/lilypond test.ly
 367 @end example
 368
 369 @item
 370 Test results?
 371
 372 @itemize
 373 @item
 374 Does it crash, or is the output bad?  If so:
 375
 376 @example
 377 git bisect bad
 378 @end example
 379
 380 @item
 381 Does your input file produce good output?  If so:
 382
 383 @example
 384 git bisect good
 385 @end example
 386
 387 @end itemize
 388
 389 @item
 390 Once the exact problem commit has been identified, git will inform
 391 you with a message like:
 392
 393 @example
 394 6d28aebbaaab1be9961a00bf15a1ef93acb91e30 is the first bad commit
 395 %%% ... blah blah blah ...
 396 @end example
 397
 398 If there is still a range of commits, then git will automatically
 399 select a new version for you to test.  Go to step #1.
 400
 401 @end enumerate
 402
 403 @subheading Recommendation: use two terminal windows
 404
 405 @itemize
 406 @item
 407 One window is open to the @code{build/} directory, and alternates
 408 between these commands:
 409
 410 @example
 411 make
 412 out/bin/lilypond test.ly
 413 @end example
 414
 415 @item
 416 One window is open to the top source directory, and alternates
 417 between these commands:
 418
 419 @example
 420 git bisect good
 421 git bisect bad
 422 @end example
 423
 424 @end itemize
 425
 426
 427 @node Memory and coverage tests
 428 @section Memory and coverage tests
 429
 430 In addition to the graphical output of the regression tests, it is
 431 possible to test memory usage and to determine how much of the source
 432 code has been exercised by the tests.
 433
 434 @subheading Memory usage
 435
 436 For tracking memory usage as part of this test, you will need
 437 GUILE CVS; especially the following patch:
 438 @smallexample
 439 @uref{http://www.lilypond.org/vc/old/gub.darcs/patches/guile-1.9-gcstats.patch}.
 440 @end smallexample
 441
 442 @subheading Code coverage
 443
 444 For checking the coverage of the test suite, do the following
 445
 446 @example
 447 ./scripts/auxiliar/build-coverage.sh
 448 @emph{# uncovered files, least covered first}
 449 ./scripts/auxiliar/coverage.py  --summary out-cov/*.cc
 450 @emph{# consecutive uncovered lines, longest first}
 451 ./scripts/auxiliar/coverage.py  --uncovered out-cov/*.cc
 452 @end example
 453
 454
 455 @node MusicXML tests
 456 @section MusicXML tests
 457
 458
 459 LilyPond comes with a complete set of regtests for the
 460 @uref{http://www.musicxml.org/,MusicXML} language.  Originally
 461 developed to test @samp{musicxml2ly}, these regression tests
 462 can be used to test any MusicXML implementation.
 463
 464 The MusicXML regression tests are found at
 465 @file{input/regression/musicxml/}.
 466
 467 The output resulting from running these tests
 468 through @samp{musicxml2ly} followed by @samp{lilypond} is
 469 available in the LilyPond documentation:
 470
 471 @example
 472 @uref{http://lilypond.org/doc/latest/input/regression/musicxml/collated-files}
 473 @end example
 474
 475
 476 @node Grand Regression Test Checking
 477 @section Grand Regression Test Checking
 478
 479 @subheading What is this all about?
 480
 481 Regression tests (usually abbreviated "regtests") is a collection
 482 of @file{.ly} files used to check whether LilyPond is working correctly.
 483 Example: before version 2.15.12 breve noteheads had incorrect width,
 484 which resulted in collisions with other objects.  After the issue was fixed,
 485 a small @file{.ly} file demonstrating the problem was added to the regression
 486 tests as a proof that the fix works.  If someone will accidentally break
 487 breve width again, we will notice this in the output of that regression test.
 488
 489 @subheading How can I help?
 490
 491 We ask you to help us by checking one or two regtests from time to time.
 492 You don't need programming skills to do this, not even LilyPond skills -
 493 just basic music notation knowledge; checking one regtest takes less than
 494 a minute.  Simply go here:
 495
 496 @example
 497 @uref{http://www.philholmes.net/lilypond/regtests/}
 498 @end example
 499
 500 @subheading Some tips on checking regtests
 501
 502 @subsubheading Description text
 503
 504 The description should be clear even for a music beginner.
 505 If there are any special terms used in the description,
 506 they all should be explained in our @rglosnamed{Top, Music Glossary}
 507 or @rinternalsnamed{Top, Internals Reference}.
 508 Vague descriptions (like "behaves well", "looks reasonable") shouldn't be used.
 509
 510 @ignore
 511 this may be useful for advanced regtest checking
 512 @subsubheading Is regtest straightforward and systematic?
 513
 514 Unfortunately some regtests are written poorly.  A good regtest should be
 515 straightforward: it should be obvious what it checks and how.  Also, it
 516 usually shouldn't check everything at once.  For example it's a bad idea to test
 517 accidental placement by constucting one huge chord with many suspended notes
 518 and loads of accidentals.  It's better to divide such problem into a series
 519 of clearly separated cases.
 520 @end ignore