3 \documentclass{article}
5 \def\postMudelaExample{\setlength{\parindent}{1em}}
6 \title{LilyPond, a Music Typesetter}
8 \usepackage{musicnotes}
15 [THIS IS WORK IN PROGRESS. THIS IS NOT FINISHED]
18 \section{Introduction}
20 The Internet has become a popular medium for collaborative work on
21 information. Its success is partly due to its use of simple, text-based
22 formats. Examples of these formats are HTML and \LaTeX. Anyone can
23 produce or modify such files using nothing but a text editor, they are
24 easily processed with run-of-the-mill text tools, and they can be
25 integrated into other text-based formats.
27 Software for processing this information and presenting these formats
28 in an elegant form is available freely (Netscape, \LaTeX, etc.).
29 Ubiquitousness of the software and simplicity of the formats have
30 revolutionised the way people publish text-based information
33 In the field of performed music, where the presentation takes the form
34 of sheet music, such a revolution has not started yet. Let us review
35 some alternatives that have been available for transmitting sheet
38 \item MIDI\cite{midi}. This format was designed for interchanging performances
39 of music; one should think of it as a glorified tape recorder
40 format. It needs dedicated editors, since it is binary. It does
41 not provide enough information for producing musical scores: some of
42 the abstract musical content of what is performed is thrown away.
44 \item PostScript\cite{Postscript}. This format is a printer control
45 language. Printed musical scores can be transmitted in PostScript,
46 but once a score is converted to PostScript, it is virtually
47 impossible to modify the score in a meaningful way.
49 \item Formats for various notation programs. Notation programs either
50 work with binary formats (e.g., NIFF\cite{niff-web}), need specific
51 platforms (e.g., Sibelius\cite{sibelius}), are proprietary or
52 non-portable tools themselves (idem), produce inadequate output
53 (e.g., MUP\cite{mup}), are based on graphical content (e.g.,
54 MusixTeX\cite{musixtex1}), limit themselves to specific subdomains
55 (e.g., ABC\cite{abc2mtex}), or require considerable skill and
56 knowledge to use (e.g., SCORE\cite{score})
58 \item SMDL\cite{smdl-web}. This is a very rich ASCII format, that is
59 designed for storing many types of music. Unfortunately, there is
60 no implementation of a program to print music from SMDL available.
61 Moreover, SMDL is so verbose, that it is not suitable for human
64 \item TAB\cite{tablature-web}. Tab (short for tablature) is a popular
65 format, for interchanging music over e-mail, but it can only be used
69 In summary, sheet music is not published and edited on a wide scale
70 across the internet because no format for music
71 interchange exists that is:
73 \item open, i.e., with publically available specifications.
74 \item based on ASCII, and therefore suitable for human consumption and
76 \item rich enough for producing publication quality sheet music from
78 \item based on musical content (unlike, for example, PostScript), and
79 therefore suitable for making modifications.
80 \item accompanied by tools for processing it that are freely available
81 across multiple platforms.
85 With the creation of LilyPond, we have tried to create both a
86 convenient format for storing sheet music, and a portable,
87 high-quality implementation of a compiler, that compiles the input
88 into a printable score. You can find a small example of LilyPond
89 input along with output in Figure~\ref{fig:intro-fig}.
93 \begin[verbatim]{mudela}
97 \transpose c'' { c4 c4 g4 g4 a4 a4 g2 }
98 { \clef "bass"; c4 c'4
99 \context Staff <e'2 {\stemdown c'4 c'4}> f'4 c'4 e'4 c'4 }
102 linewidth = -1.0\cm ;
106 \caption{A small example of LilyPond input}
107 \label{fig:intro-fig}
113 The input language encodes musical events (such as notes and rests) on
114 the basis of their time-ordering. For example, the grammar includes
115 constructs that specify that notes start simultaneous and that notes
116 are to be played in sequence. In this encoding some context that is
117 present in sheet music is lost.
119 The compiler reconstructs the notation from the encoded music. Its
120 operation comprises four different steps (see
121 Figure~\ref{fig:intro-steps}).
124 \item[Parsing] During parsing, the input is converted in a syntax tree.
126 \item[Interpreting] In the \emph{interpreting} step, it is determined
127 which symbols have to be printed. Objects that correspond to
128 notation (\emph{Graphical objects}) are created from the syntax tree
129 in this phase. Generally speaking, for every symbol printed there is
130 one graphical object. These objects are incomplete: their position
131 and their final shape is unknown.
133 The context that was lost by encoding the input in a language is
134 reconstructed during this conversion.
135 \item[Formatting] The next step is determing where symbols are to be
136 placed, this is called \emph{formatting}.
138 Finally, all Graphical objects are outputted as PostScript or \TeX\ code.
141 \def\staffsym{\vbox to 16pt{
142 \hbox{\vrule width 1cm depth .2pt height .2pt}\nointerlineskip
144 \hbox{\vrule width 1cm depth .2pt height .2pt}\nointerlineskip
146 \hbox{\vrule width 1cm depth .2pt height .2pt}\nointerlineskip
148 \hbox{\vrule width 1cm depth .2pt height .2pt}\nointerlineskip
150 \hbox{\vrule width 1cm depth .2pt height .2pt}\nointerlineskip
153 \def\vspacer{\vbox to 20pt{\vss}}
155 \def\spacedhbox#1{\hbox{\ #1\ }}
157 {\spacedhbox{Input}\atop \hbox{\texttt{\{c8 c8\}}}} {\spacedhbox{Parsing}\atop\longrightarrow}
158 {\spacedhbox{Syntax tree}\atop\spacedhbox{\textsf{Sequential(Note,Note)}}}
159 {\spacedhbox{Interpreting}\atop\longrightarrow}\\
161 {\spacedhbox{Graphic objects}\atop\spacedhbox{\texttrebleclef \textquarterhead\texteighthflag\textquarterhead\texteighthflag \staffsym }}
162 {\spacedhbox{Formatting}\atop\longrightarrow}
163 {\spacedhbox{Formatted objects}\atop\hbox{
167 {\spacedhbox{Outputting}\atop\longrightarrow}
168 {\spacedhbox{PostScript code}\atop\hbox{\texttt{\%!PS-Adobe}\ldots}}
170 \caption{Parsing, Interpreting, Formatting and Outputting}
171 \label{fig:intro-steps}
175 The second step, the interpretation phase of the compiler, can be
176 manipulated as a separate entity: the interpretation process is
177 composed of many separate modules, and the behaviour of the modules is
178 parameterised. By recombining these interpretation modules,
179 and changing parameter settings, the same piece of music can be
180 printed differently, as is shown in Figure~\ref{fig:intro-interpret}.
182 This makes it easy to extend the program. Moreover, this enables the
183 same music to be printed in different versions, e.g., in a conductors
184 score and in extracted parts.
192 \context GrandStaff <
193 \transpose c'' { c4 c4 g4 g4 a4 a4 g2 }
194 { \clef "bass"; c4 c'4
195 \context Staff <e'2 {\stemdown c'4 c'4}> f'4 c'4 e'4 c'4 }
198 linewidth = -1.0\cm ;
201 \remove "Stem_engraver";
205 numberOfStaffLines = 3;
210 \caption{The interpretation phase can be manipulated: the same
211 music as in Figure~\ref{fig:intro-fig} is interpreted
212 differently: three staff lines and no stems.}
213 \label{fig:intro-interpret}
219 \section{Preliminaries}
221 To understand the rest of the article, it is necessary to know
222 something about music notation and music typography. Since both
223 communicate music, we will explain some characteristics of instruments
224 and western music that motivate some notational constructs.
228 Music notation is meant to be read by human performers. They sing or
229 play instruments that can produce sounds of different pitches. These
230 sounds are called \emph{notes}. Additionally, the sounds can be
231 articulated in differents ways, e.g., staccato (short and separated)
232 or legato (fluently bound together). The loudness of the notes can
233 also be varied. Changes in loudness are called \emph{dynamics}.
235 Silence is also an element of music. The musical terminology for
236 silence within music is \emph{rest}.
238 The basic unit of pitch is the \emph{octave}. The octave corresponds
239 to a frequency ratio of 1:2. For example the pitch denoted by a'
240 (frequency: 440 hertz) is one octave lower than a'' (frequency: 880
241 hertz). Various instruments have a limited \emph{pitch range}, for
242 example, a trumpet has a range of about 2.5 octaves. Not all
243 instruments have ranges in the same register: a tuba also has a range
244 of 2.5 octaves, but the range of the tuba is much lower.
246 Musicology has a confusing mix of relative and absolute measures for
247 pitches: the term `octave' refers to both a difference between two
248 pitches (the frequency ratio of 1:2), and to a range of pitches. For
249 example, the term `[eengestreept] octave' refers to the pitch range
250 between 261.6 Hz and 523.3 Hz.
253 The octave is divided into smaller pitch steps. In modern western
254 music, every octave is divided into twelve approximately equidistant
255 pitch steps, and each step is called a \emph{semitone}. Usually, the
256 pitches in a musical piece come from a smaller subset of these twelve
257 possible pitches. This smaller subset along with the musical
258 functions fo the pitches is called the
259 \emph{tonality}\footnote{Tonality also refers to the relations between
260 and functions of certain pitches. Since these do not have any
261 impact on notation, we ignore this} of the piece.
264 The standard tonality that forms the basis of music notation
265 (the key of C major) is a set of seven pitches within every octave.
266 Each of these seven is denoted by a name. In English, these are names
267 are (in rising pitch) denoted by c, d, e, f, g, a and b. Pitches that
268 are a semitone higher or lower than one of these seven can be
269 represented by suffixing the name with `sharp' or `flat'
270 respectively (this is called an \emph{chromatic alteration}).
272 A pitch therefore can be fully specified by a combination of the
273 octave number, the note name and a chromatic alteration.
274 Figure~\ref{fig:intro-pitches} shows the relation between names and
284 \caption{Pitches in western music. The octave number is denoted
286 \label{fig:intro-pitches}
290 Many instruments can produce more than one note at the same time, e.g.
291 pianos and guitars. When more notes are played simultaneously, they
292 form a so-called \emph{chord}.
294 The unit of duration is the \emph{beat}. When playing, the tempo is
295 determined by setting the number of beats per minute. In western
296 music, beats are often stressed in a regular pattern: for example
297 Waltzes have a stress pattern that is strong-weak-weak, i.e. every
298 note that starts on a `strong' beat is louder and has more pronounced
299 articulation. This stress pattern is called \emph{meter}.
301 \subsection{Music notation}
303 Music notation is a system that tries to represent musical ideas
304 through printed symbols. Music notation has no precise definition,
305 but most conventions have described in reference manuals on music
306 notation\cite{read-notation}.
308 In music notation, sounds and silences are represented by symbols that
309 are called note and rest respectively.\footnote{These names serve a
310 double purpose: the same terms are used to denote the musical
311 concepts.} The shape of notes and rests indicates their duration
312 (See figure~\ref{noteshapes}) relative to the whole note.
319 \notes \transpose c''{ c\longa*1/4 c\breve*1/2 c1 c2 c4 c8 c16 c32 c64 }
323 \remove "Staff_symbol_engraver";
324 \remove "Time_signature_engraver";
325 % \remove "Bar_engraver";
326 \remove "Clef_engraver";
334 \notes \transpose c''\context Staff { r\longa*1/4 r\breve*1/2 r1 r2 r4 r8 r16 r32 r64 }
338 \remove "Staff_symbol_engraver";
339 \remove "Time_signature_engraver";
340 % \remove "Bar_engraver";
341 \remove "Clef_engraver";
347 \caption{Note and rest shapes encode the length. At the top notes
348 are shown, at the bottom rests. From left to right a quadruple
349 note (\emph{longa}), double (\emph{breve}), whole, half,
350 quarter, eigth, sixteenth, thirtysecond and sixtyfourth. Each
351 note has half of the duration of its predecessor.}
352 \label{fig:noteshapes}
357 Notes are printed in a grid of horizontal lines called \emph{staff} to
358 denote their pitch: each line represents the pitch of from the
359 standard scale (c, d, e, f, g, a, b). The reference point is the
360 \emph{clef}, eg., the treble clef marks the location of the $g^1$
361 pitch. The notes are printed in their time order, from left to right.
368 a4 b c d e f g a \clef bass;
369 a4 b c d e f g a \clef alto;
370 a4 b c d e f g a \clef treble;
372 \paper { linewidth = 15.\cm; }
375 \caption{Pitches ranging from $a, b, c',\ldots a'$, in different
376 clefs. From left right the bass, alto and treble clef are
382 The chromatic alterations are indicated by printing a flat sign or a
383 sharp sign in front of the note head. If these chromatic alterations
384 occur systematically (if they are part of the tonality of the piece),
385 then this indicated with a \emph{key signature}. This is a list of
386 sharp/flat signs which is printed next to the clef.
388 Articulation is notated by marking the note shapes wedges, hats and
389 dots all indicate specific articulations. If the notes are to be
390 bound fluently (legato), the note shapes are encompassed by a smooth
391 curve called \emph{slur},
396 c'4-> c'4-. g'4 ( b'4 ) g''4
398 \caption{Some articulations. From left to right: extra stress
399 (\emph{marcato}), short (staccato), slurred notes (legato).}
400 \label{fig:articulation}
406 Dynamics are notated in two ways: absolute dynamics are indicated by
407 letters: \textbf{f} (from Italian ``forte'') stands for loud,
408 \textbf{p} (from Italian ``piano'') means soft. Gradual changes in
409 loudness are notated by (de)crescendos. These are hairpin like shapes
415 g'4\pp \< g'4 \! g'4 \ff \> g'4 g' \! g'\ppp
417 \caption{Dynamics: start very soft (pp), grow to loud (ff) and
418 decrease to extremely soft (ppp)}
424 The meter is indicated by barlines: every start of the stress pattern
425 is preceded by a vertical line, the \emph{bar line}. The space
426 between two bar lines is called measure. It is therefore the unit of
427 the rhythmic pattern.
429 The time signature also indicates what kind of rhythmic pattern is
430 desired. The time signature takes the form of two numbers stacked
431 vertically. The top number is the number of beats in one measure, the
432 bottom number is the duration (relative to the whole note) of the note
433 that takes one beat. Example: 2/4 time signature means ``two beats
434 per measure, and a quarter note takes one beat''
436 Chords are written by attaching multiple note heads to one stem. When
437 the composer wants to emphasize the horizontal relationships between
438 notes, the simultaneous notes can be written as voices (where every
439 note head has its own stem). A small example is given in
440 Figure~\ref{fig:simultaneous}.
445 \relative c'' {\time 2/4; <c4 e> <d f>
446 \context Staff < \context Voice = VA{
450 \context Voice = VB {
451 \stemup e4 f g8 g4 g8 } >
454 \caption{Notes sounding together. Chord notation (left, before
455 the bar line) emphasizes vertical relations, voice notation
456 emphasizes horizontal relations. Separate voices needn't have
457 synchronous rhythms (third measure).
459 \label{fig:simultaneous}
463 Separate voices do not have to share one rhythmic pattern---this is
464 also demonstrated in Figure~\ref{fig:simultaneous}--- they are in a sense%vaag
465 independent. A different way to express this in notation, is by
466 printing each voice on a different staff. This is customary when
467 writing for piano (both left and right hand have a staff of their own)
468 and for ensemble (every instrument has a staff of its own).
472 \subsection{Music typography}
474 Music typography is the art of placing symbols in esthetically
475 pleasing configuration. Little is explicitly known about music
476 typography. There are only a few reference works
477 available\cite{ross,wanske}. Most of the knowledge of this art has
478 been transmitted verbally, and was subsequently lost.
480 The motivation behind choices in typography is to represent the idea
481 as clearly as possible. Among others, this results in the following
484 \item The printed score should use visual hints to accentuate the
486 \item The printed score should not contain distracting elements, such
487 as large empty regions or blotted regions.
490 An example of the first guideline in action is the horizontal spacing.
491 The amount of space following a note should reflect the duration of
492 that note: short notes get a small amount of space, long notes larger
493 amounts. Such spacing constraints can be subtle, for the
494 ``amount of space'' is only the impression that should be conveyed; there
495 has to be some correction for optical illusions. See
496 Figure~\ref{fig:spacing}.
501 \relative c'' { \time 3/4; c16 c c c c8 c8 | f4 f, f' }
503 \caption{Spacing conveys information about duration. Sixteenth
504 notes at the left get less space than quarter notes in the
505 middle. Spacing is ``visual'', there should be more space
506 after the first note of the last measure, and less after second. }
511 Another example of music typography is clearly visible in collisions.
512 When chords or separate voices are printed, the notes that start at
513 the same time should be printed aligned (ie., with the same $x$
514 position). If the pitches are close to each other, the note heads
515 would collide. To prevent this, some notes (or note heads) have to be
516 shifted horizontally. An example of this is given in
517 Figure~\ref{fig:collision}.
522 \label{fig:collision}
526 \bibliographystyle{hw-plain}
527 \bibliography{engraving,boeken,colorado,computer-notation,other-packages}
529 \section{Requirements}
536 The input format consists of combining a symbolic representation of
537 music with style sheet that describes how the symbolic presentation
538 can converted to notation. The symbolic representation is based on a
539 context free language called \textsf{music}. Music is a recursively
540 defined construction in the input language. It can be constructed by
541 combining lists of \textsf{music} sequentially or parallel or from
542 terminals like notes or lyrics.
544 The grammar for \textsf{music} is listed below. It has been edited to
545 leave out the syntactic and ergonomic details.
549 Music: & SimpleMusic\\
550 & $|$ REPEATED int Music ALTERNATIVE MusicList\\
551 & $|$ SIMULTANEOUS MusicList\\
552 & $|$ SEQUENTIAL MusicList\\
553 & $|$ CONTEXT STRING '=' STRING Music\\
554 & $|$ TIMES int int Music \\
555 & $|$ TRANSPOSE PITCH Music \\
556 SimpleMusic: & $|$ Note\\
561 Command: & METERCHANGE\\
563 &$|$ PROPERTY STRING '=' STRING\\
564 Chord: &PitchList DURATION\\
565 Rest: &REST DURATION\\
566 Lyric: &STRING DURATION\\
567 Note: &PITCH DURATION\\
571 The terminals are both purely musical concepts that have a duration,
572 and take a non-zero amount of musical time, like notes and lyrics, and
573 commands that behave as if they have no duration.\footnote{The
574 PROPERTY command is a generic mechanism for controlling the
575 interpretation, i.e. the musical style sheets. See [forward ref]}
577 The nonterminal productions can
579 \item Some productions combine multiple elements: one can specify that
580 element are to be played in sequence, simultaneously or repetitively.
581 \item There are productions for transposing music, and for dilating
582 durations of music: the TIMES production can be used to encode a
583 triplet.\footnote{A triplet is a group of three notes marked by a
584 bracket, that are played 3/2 times faster.}
586 There are productions that give directions to the interpretation
587 engine (the CONTEXT production)
591 \section{Context in notation}
593 Music notation relies heavily on context. Notational symbols do not
594 have meaning if they are not surrounded by other context elements. In
595 this section we give some examples how the reader uses this context do
596 derive meaning of a piece of notation. We will focus on the prime
597 example of context: the staff.
599 A staff is the grid of five horizontal lines, but it contains more components :
601 \item A staff can have a key signature (printed at the left)
602 \item A staff can have a time signature (printed at the left)
603 \item A staff has bar lines
604 \item A staff has a clef (printed at the left)
607 It is still possible to print notes without these components, but one
608 cannot determine the meaning of the notes.
611 \notes \relative c' { \time 2/4; g'4 c,4 a'4 f4 e c d2 }
616 \remove "Time_signature_engraver";
617 % \remove "Bar_engraver";
618 \remove "Staff_symbol_engraver";
619 \remove "Clef_engraver";
620 \remove "Key_engraver";
626 As you can see, you can still make out the general form of the melody
627 and the rhythm that is to be played, but the notation is difficult to
628 read and the musical information is not complete. The stress
629 pattern in the notes can't be deduced from this output. For this, we
630 need a time signature. Adding barlines helps with finding the strong
634 \notes \relative c' { \time 2/4; g'4 c,4 a'4 f4 e c d2 }
639 \remove "Staff_symbol_engraver";
640 \remove "Clef_engraver";
641 \remove "Key_engraver";}
646 It is impossible to deduce the exact pitch of the notes. One needs a
647 clef to do so. Staff lines help the eye in determining the vertical
648 position of a note wrt. to the clef.
651 \notes \relative c' {\clef alto; \time 2/4; g'4 c,4 a'4 f4 e c d2 }
658 Now you know the pitch of the notes: you look at the start of the line
659 and see a clef, and with this clef, you can determine the notated pitches.
660 You have found the em(context) in which the notation is to be
664 \section{Interpretation context}
666 Context (clef, time signature etc.) determines the relationship
667 between musical and its notation in notes. Because LilyPond writes
668 notation, context works the other way around for LilyPond: with
669 context a piece of music can be converted to notation.
671 A reader remembers this context while reading the notation from left
672 to right. By analogy, LilyPond constructs this context while
673 constructing notes from left to right. This is what happens in the
674 ``Interpretation'' phase from~\ref{fig:intro-fig}. In LilyPond, the
675 state of this context is a set of variables with their values; A staff
676 context contains variables like
680 \item current time signature
684 These variables determine when and how clefs, time signatures, bar
685 lines and accidentals are printed.
688 Staff is not the only form of context in notation. In polyphonic
689 music, the stem direction shows which notes form a voice: all notes of
690 the same voice have stems pointing in the same direction. The value
691 of this variable determines the appearance of the printed stems.
693 In LilyPond ``Notation context'' is abstracted to a data structure
694 that is used, constructed and modified during the interpretation
695 phase. It contains context properties, and is responsible for
696 creating notational elements: the staff context creates symbols for
697 clefs, time signatures and key signatures. The Voice context creates
700 For the fragment of polyphonic music below,
702 \context Staff { c'4 < { \stemup c'4 } \context Voice = VB { \stemdown a4 } > }
704 A staff context is created. Within this staff context (which printed
705 the clef), a Voice context is created, which prints the first note.
706 Then, a second Voice context is created, with stem direction set to
707 ``up'', and the direction for the other is set to down. Both Voice
708 contexts are still part of the same Staff context.
710 In the same way, multiple staff scores are created: within the score
711 context, multiple staff contexts are created. Every staff context
712 creates the notation associated with a staff.
720 The complexity of music notation was tackled by adopting a modular
721 design: both the formatting system (which encodes the esthetic rules of
722 notation), and the interpretation system (which encodes the semantic
723 rules) are highly modular.
726 The difficulty in creating a format for music notation is rooted in
727 the fact that music is multi dimensional: each sound has its own
728 duration, pitch, loudness and articulation. Additionally, multiple
729 sounds may be played simultaneously. Because of this, there is no
730 obvious way to ``flatten'' music into a context-free language.
732 The difficulty in creating a printing engine is rooted in the fact
733 that music notation complicated: it is very large graphical
734 ``language'' with many arbitrary esthetic and semantic conventions.
735 Building a system that formats full fledged musical notation is a
736 challenge in itself, regardless of whether it is part of a compiler or
739 The fact that music and its notation are of a different nature,
740 implies that the conversion between input notation is non-trivial.
742 In LilyPond we solved the above problem in the following way: