1
==================================================
2
A Record of reStructuredText Syntax Alternatives
3
==================================================
6
:Contact: goodger@users.sourceforge.net
7
:Revision: $Revision: 4120 $
8
:Date: $Date: 2005-12-01 01:56:35 +0100 (Thu, 01 Dec 2005) $
9
:Copyright: This document has been placed in the public domain.
11
The following are ideas, alternatives, and justifications that were
12
considered for reStructuredText syntax, which did not originate with
13
Setext_ or StructuredText_. For an analysis of constructs which *did*
14
originate with StructuredText or Setext, please see `Problems With
15
StructuredText`_. See the `reStructuredText Markup Specification`_
16
for full details of the established syntax.
18
The ideas are divided into sections:
20
* Implemented_: already done. The issues and alternatives are
21
recorded here for posterity.
23
* `Not Implemented`_: these ideas won't be implemented.
25
* Tabled_: these ideas should be revisited in the future.
27
* `To Do`_: these ideas should be implemented. They're just waiting
28
for a champion to resolve issues and get them done.
30
* `... Or Not To Do?`_: possible but questionable. These probably
31
won't be implemented, but you never know.
33
.. _Setext: http://docutils.sourceforge.net/mirror/setext.html
35
http://www.zope.org/DevHome/Members/jim/StructuredTextWiki/FrontPage
36
.. _Problems with StructuredText: problems.html
37
.. _reStructuredText Markup Specification:
38
../../ref/rst/restructuredtext.html
50
Prior to the syntax for field lists being finalized, several
51
alternatives were proposed.
53
1. Unadorned RFC822_ everywhere::
58
Advantages: clean, precedent (RFC822-compliant). Disadvantage:
59
ambiguous (these paragraphs are a prime example).
63
2. Special case: use unadorned RFC822_ for the very first or very last
64
text block of a document::
70
The rest of the document...
73
Advantages: clean, precedent (RFC822-compliant). Disadvantages:
74
special case, flat (unnested) field lists only, still ambiguous::
77
Usage: cmdname [options] arg1 arg2 ...
79
We obviously *don't* want the like above to be interpreted as a
80
field list item. Or do we?
83
Conclusion: rejected for the general case, accepted for specific
84
contexts (PEPs, email).
93
Advantages: explicit and unambiguous, RFC822-compliant.
94
Disadvantage: cumbersome.
96
Conclusion: rejected for the general case (but such a directive
97
could certainly be written).
99
4. Use Javadoc-style::
105
Advantages: unambiguous, precedent, flexible. Disadvantages:
106
non-intuitive, ugly, not RFC822-compliant.
108
Conclusion: rejected.
110
5. Use leading colons::
115
Advantages: unambiguous, obvious (*almost* RFC822-compliant),
116
flexible, perhaps even elegant. Disadvantages: no precedent, not
117
quite RFC822-compliant.
119
Conclusion: accepted!
121
6. Use double colons::
126
Advantages: unambiguous, obvious? (*almost* RFC822-compliant),
127
flexible, similar to syntax already used for literal blocks and
128
directives. Disadvantages: no precedent, not quite
129
RFC822-compliant, similar to syntax already used for literal blocks
132
Conclusion: rejected because of the syntax similarity & conflicts.
134
Why is RFC822 compliance important? It's a universal Internet
135
standard, and super obvious. Also, I'd like to support the PEP format
136
(ulterior motive: get PEPs to use reStructuredText as their standard).
137
But it *would* be easy to get used to an alternative (easy even to
138
convert PEPs; probably harder to convert python-deviants ;-).
140
Unfortunately, without well-defined context (such as in email headers:
141
RFC822 only applies before any blank lines), the RFC822 format is
142
ambiguous. It is very common in ordinary text. To implement field
143
lists unambiguously, we need explicit syntax.
145
The following question was posed in a footnote:
147
Should "bibliographic field lists" be defined at the parser level,
148
or at the DPS transformation level? In other words, are they
149
reStructuredText-specific, or would they also be applicable to
150
another (many/every other?) syntax?
152
The answer is that bibliographic fields are a
153
reStructuredText-specific markup convention. Other syntaxes may
154
implement the bibliographic elements explicitly. For example, there
155
would be no need for such a transformation for an XML-based markup
158
.. _RFC822: http://www.rfc-editor.org/rfc/rfc822.txt
161
Interpreted Text "Roles"
162
========================
164
The original purpose of interpreted text was as a mechanism for
165
descriptive markup, to describe the nature or role of a word or
166
phrase. For example, in XML we could say "<function>len</function>"
167
to mark up "len" as a function. It is envisaged that within Python
168
docstrings (inline documentation in Python module source files, the
169
primary market for reStructuredText) the role of a piece of
170
interpreted text can be inferred implicitly from the context of the
171
docstring within the program source. For other applications, however,
172
the role may have to be indicated explicitly.
174
Interpreted text is enclosed in single backquotes (`).
176
1. Initially, it was proposed that an explicit role could be indicated
177
as a word or phrase within the enclosing backquotes:
179
- As a prefix, separated by a colon and whitespace::
181
`role: interpreted text`
183
- As a suffix, separated by whitespace and a colon::
185
`interpreted text :role`
187
There are problems with the initial approach:
189
- There could be ambiguity with interpreted text containing colons.
190
For example, an index entry of "Mission: Impossible" would
191
require a backslash-escaped colon.
193
- The explicit role is descriptive markup, not content, and will
194
not be visible in the processed output. Putting it inside the
195
backquotes doesn't feel right; the *role* isn't being quoted.
197
2. Tony Ibbs suggested that the role be placed outside the
200
role:`prefix` or `suffix`:role
202
This removes the embedded-colons ambiguity, but limits the role
203
identifier to be a single word (whitespace would be illegal).
204
Since roles are not meant to be visible after processing, the lack
205
of whitespace support is not important.
207
The suggested syntax remains ambiguous with respect to ratios and
208
some writing styles. For example, suppose there is a "signal"
209
identifier, and we write::
211
...calculate the `signal`:noise ratio.
213
"noise" looks like a role.
215
3. As an improvement on #2, we can bracket the role with colons::
217
:role:`prefix` or `suffix`:role:
219
This syntax is similar to that of field lists, which is fine since
220
both are doing similar things: describing.
222
This is the syntax chosen for reStructuredText.
224
4. Another alternative is two colons instead of one::
226
role::`prefix` or `suffix`::role
228
But this is used for analogies ("A:B::C:D": "A is to B as C is to
231
Both alternative #2 and #4 lack delimiters on both sides of the
232
role, making it difficult to parse (by the reader).
234
5. Some kind of bracketing could be used:
238
(role)`prefix` or `suffix`(role)
242
{role}`prefix` or `suffix`{role}
246
[role]`prefix` or `suffix`[role]
250
<role>`prefix` or `suffix`<role>
252
(The overlap of \*ML tags with angle brackets would be too
253
confusing and precludes their use.)
255
Syntax #3 was chosen for reStructuredText.
261
A problem with comments (actually, with all indented constructs) is
262
that they cannot be followed by an indented block -- a block quote --
263
without swallowing it up.
265
I thought that perhaps comments should be one-liners only. But would
266
this mean that footnotes, hyperlink targets, and directives must then
267
also be one-liners? Not a good solution.
269
Tony Ibbs suggested a "comment" directive. I added that we could
270
limit a comment to a single text block, and that a "multi-block
271
comment" could use "comment-start" and "comment-end" directives. This
272
would remove the indentation incompatibility. A "comment" directive
273
automatically suggests "footnote" and (hyperlink) "target" directives
274
as well. This could go on forever! Bad choice.
276
Garth Kidd suggested that an "empty comment", a ".." explicit markup
277
start with nothing on the first line (except possibly whitespace) and
278
a blank line immediately following, could serve as an "unindent". An
279
empty comment does **not** swallow up indented blocks following it,
280
so block quotes are safe. "A tiny but practical wart." Accepted.
286
Alan Jaffray came up with this idea, along with the following syntax::
288
Search the `Python DOC-SIG mailing list archives`{}_.
290
.. _: http://mail.python.org/pipermail/doc-sig/
292
The idea is sound and useful. I suggested a "double underscore"
295
Search the `Python DOC-SIG mailing list archives`__.
297
.. __: http://mail.python.org/pipermail/doc-sig/
299
But perhaps single underscores are okay? The syntax looks better, but
300
the hyperlink itself doesn't explicitly say "anonymous"::
302
Search the `Python DOC-SIG mailing list archives`_.
304
.. _: http://mail.python.org/pipermail/doc-sig/
306
Mixing anonymous and named hyperlinks becomes confusing. The order of
307
targets is not significant for named hyperlinks, but it is for
308
anonymous hyperlinks::
310
Hyperlinks: anonymous_, named_, and another anonymous_.
316
Without the extra syntax of double underscores, determining which
317
hyperlink references are anonymous may be difficult. We'd have to
318
check which references don't have corresponding targets, and match
319
those up with anonymous targets. Keeping to a simple consistent
320
ordering (as with auto-numbered footnotes) seems simplest.
322
reStructuredText will use the explicit double-underscore syntax for
323
anonymous hyperlinks. An alternative (see `Reworking Explicit Markup
324
(Round 1)`_ below) for the somewhat awkward ".. __:" syntax is "__"::
326
An anonymous__ reference.
331
Reworking Explicit Markup (Round 1)
332
===================================
334
Alan Jaffray came up with the idea of `anonymous hyperlinks`_, added
335
to reStructuredText. Subsequently it was asserted that hyperlinks
336
(especially anonymous hyperlinks) would play an increasingly important
337
role in reStructuredText documents, and therefore they require a
338
simpler and more concise syntax. This prompted a review of the
339
current and proposed explicit markup syntaxes with regards to
344
.. _blah: internal hyperlink target
345
.. _blah: http://somewhere external hyperlink target
346
.. _blah: blahblah_ indirect hyperlink target
347
.. __: anonymous internal target
348
.. __: http://somewhere anonymous external target
349
.. __: blahblah_ anonymous indirect target
350
.. [blah] http://somewhere footnote
351
.. blah:: http://somewhere directive
352
.. blah: http://somewhere comment
356
The comment text was intentionally made to look like a hyperlink
361
* Except for the colon (a delimiter necessary to allow for
362
phrase-links), hyperlink target ``.. _blah:`` comes from Setext.
363
* Comment syntax from Setext.
364
* Footnote syntax from StructuredText ("named links").
365
* Directives and anonymous hyperlinks original to reStructuredText.
369
+ Consistent explicit markup indicator: "..".
370
+ Consistent hyperlink syntax: ".. _" & ":".
374
- Anonymous target markup is awkward: ".. __:".
375
- The explicit markup indicator ("..") is excessively overloaded?
376
- Comment text is limited (can't look like a footnote, hyperlink,
377
or directive). But this is probably not important.
379
2. Alan Jaffray's proposed syntax #1::
381
__ _blah internal hyperlink target
382
__ blah: http://somewhere external hyperlink target
383
__ blah: blahblah_ indirect hyperlink target
384
__ anonymous internal target
385
__ http://somewhere anonymous external target
386
__ blahblah_ anonymous indirect target
387
__ [blah] http://somewhere footnote
388
.. blah:: http://somewhere directive
389
.. blah: http://somewhere comment
391
The hyperlink-connoted underscores have become first-level syntax.
395
+ Anonymous targets are simpler.
396
+ All hyperlink targets are one character shorter.
400
- Inconsistent internal hyperlink targets. Unlike all other named
401
hyperlink targets, there's no colon. There's an extra leading
402
underscore, but we can't drop it because without it, "blah" looks
403
like a relative URI. Unless we restore the colon::
405
__ blah: internal hyperlink target
409
3. Alan Jaffray's proposed syntax #2::
411
.. _blah internal hyperlink target
412
.. blah: http://somewhere external hyperlink target
413
.. blah: blahblah_ indirect hyperlink target
414
.. anonymous internal target
415
.. http://somewhere anonymous external target
416
.. blahblah_ anonymous indirect target
417
.. [blah] http://somewhere footnote
418
!! blah: http://somewhere directive
419
## blah: http://somewhere comment
421
Leading underscores have been (almost) replaced by "..", while
422
comments and directives have gained their own syntax.
426
+ Anonymous hyperlinks are simpler.
427
+ Unique syntax for comments. Connotation of "comment" from
428
some programming languages (including our favorite).
429
+ Unique syntax for directives. Connotation of "action!".
433
- Inconsistent internal hyperlink targets. Again, unlike all other
434
named hyperlink targets, there's no colon. There's a leading
435
underscore, matching the trailing underscores of references,
436
which no other hyperlink targets have. We can't drop that one
437
leading underscore though: without it, "blah" looks like a
438
relative URI. Again, unless we restore the colon::
440
.. blah: internal hyperlink target
442
- All (except for internal) hyperlink targets lack their leading
443
underscores, losing the "hyperlink" connotation.
445
- Obtrusive syntax for comments. Alternatives::
447
;; blah: http://somewhere
448
(also comment syntax in Lisp & others)
449
,, blah: http://somewhere
450
("comma comma": sounds like "comment"!)
452
- Iffy syntax for directives. Alternatives?
454
4. Tony Ibbs' proposed syntax::
456
.. _blah: internal hyperlink target
457
.. _blah: http://somewhere external hyperlink target
458
.. _blah: blahblah_ indirect hyperlink target
459
.. anonymous internal target
460
.. http://somewhere anonymous external target
461
.. blahblah_ anonymous indirect target
462
.. [blah] http://somewhere footnote
463
.. blah:: http://somewhere directive
464
.. blah: http://somewhere comment
466
This is the same as the current syntax, except for anonymous
467
targets which drop their "__: ".
471
+ Anonymous targets are simpler.
475
- Anonymous targets lack their leading underscores, losing the
476
"hyperlink" connotation.
477
- Anonymous targets are almost indistinguishable from comments.
478
(Better to know "up front".)
480
5. David Goodger's proposed syntax: Perhaps going back to one of
481
Alan's earlier suggestions might be the best solution. How about
482
simply adding "__ " as a synonym for ".. __: " in the original
483
syntax? These would become equivalent::
485
.. __: anonymous internal target
486
.. __: http://somewhere anonymous external target
487
.. __: blahblah_ anonymous indirect target
489
__ anonymous internal target
490
__ http://somewhere anonymous external target
491
__ blahblah_ anonymous indirect target
493
Alternative 5 has been adopted.
496
Backquotes in Phrase-Links
497
==========================
499
[From a 2001-06-05 Doc-SIG post in reply to questions from Doug
502
The first draft of the spec, posted to the Doc-SIG in November 2000,
503
used square brackets for phrase-links. I changed my mind because:
505
1. In the first draft, I had already decided on single-backquotes for
508
2. However, I wanted to minimize the necessity for backslash escapes,
509
for example when quoting Python repr-equivalent syntax that uses
512
3. The processing of identifiers (function/method/attribute/module
513
etc. names) into hyperlinks is a useful feature. PyDoc recognizes
514
identifiers heuristically, but it doesn't take much imagination to
515
come up with counter-examples where PyDoc's heuristics would result
516
in embarassing failure. I wanted to do it deterministically, and
517
that called for syntax. I called this construct "interpreted
520
4. Leveraging off the ``*emphasis*/**strong**`` syntax, lead to the
521
idea of using double-backquotes as syntax.
523
5. I worked out some rules for inline markup recognition.
525
6. In combination with #5, double backquotes lent themselves to inline
526
literals, neatly satisfying #2, minimizing backslash escapes. In
527
fact, the spec says that no interpretation of any kind is done
528
within double-backquote inline literal text; backslashes do *no*
529
escaping within literal text.
531
7. Single backquotes are then freed up for interpreted text.
533
8. I already had square brackets required for footnote references.
535
9. Since interpreted text will typically turn into hyperlinks, it was
536
a natural fit to use backquotes as the phrase-quoting syntax for
537
trailing-underscore hyperlinks.
539
The original inspiration for the trailing underscore hyperlink syntax
540
was Setext. But for phrases Setext used a very cumbersome
541
``underscores_between_words_like_this_`` syntax.
543
The underscores can be viewed as if they were right-pointing arrows:
544
``-->``. So ``hyperlink_`` points away from the reference, and
545
``.. _hyperlink:`` points toward the target.
548
Substitution Mechanism
549
======================
551
Substitutions arose out of a Doc-SIG thread begun on 2001-10-28 by
552
Alan Jaffray, "reStructuredText inline markup". It reminded me of a
553
missing piece of the reStructuredText puzzle, first referred to in my
554
contribution to "Documentation markup & processing / PEPs" (Doc-SIG
557
Substitutions allow the power and flexibility of directives to be
558
shared by inline text. They are a way to allow arbitrarily complex
559
inline objects, while keeping the details out of the flow of text.
560
They are the equivalent of SGML/XML's named entities. For example, an
561
inline image (using reference syntax alternative 4d (vertical bars)
562
and definition alternative 3, the alternatives chosen for inclusion in
565
The |biohazard| symbol must be used on containers used to dispose
568
.. |biohazard| image:: biohazard.png
571
The ``|biohazard|`` substitution reference will be replaced in-line by
572
whatever the ``.. |biohazard|`` substitution definition generates (in
573
this case, an image). A substitution definition contains the
574
substitution text bracketed with vertical bars, followed by a an
575
embedded inline-compatible directive, such as "image". A transform is
576
required to complete the substitution.
578
Syntax alternatives for the reference:
580
1. Use the existing interpreted text syntax, with a predefined role
583
The `biohazard`:sub: symbol...
585
Advantages: existing syntax, explicit. Disadvantages: verbose,
588
2. Use a variant of the interpreted text syntax, with a new suffix
589
akin to the underscore in phrase-link references::
600
Due to incompatibility with other constructs and ordinary text
601
usage, (f) and (g) are not possible.
603
3. Use interpreted text syntax with a fixed internal format::
619
To avoid ML confusion (k) and (l) are definitely out. Square
620
brackets (j) won't work in the target (the substitution definition
621
would be indistinguishable from a footnote).
623
The ```/name/``` syntax (g) is reminiscent of "s/find/sub"
624
substitution syntax in ed-like languages. However, it may have a
625
misleading association with regexps, and looks like an absolute
626
POSIX path. (i) is visually equivalent and lacking the
629
A disadvantage of all of these is that they limit interpreted text,
630
albeit only slightly.
632
4. Use specialized syntax, something new::
649
"#" (a) and "@" (b) are obtrusive. "/" (c) without backquotes
650
looks just like a POSIX path; it is likely for such usage to appear
653
"|" (d) and "^" (h) are feasible.
655
5. Redefine the trailing underscore syntax. See definition syntax
656
alternative 4, below.
658
Syntax alternatives for the definition:
660
1. Use the existing directive syntax, with a predefined directive such
661
as "sub". It contains a further embedded directive resolving to an
662
inline-compatible object::
665
.. image:: biohazard.png
669
That bird wouldn't *voom* if you put 10,000,000 volts
672
The advantages and disadvantages are the same as in inline
675
2. Use syntax as in #1, but with an embedded directivecompressed::
677
.. sub:: biohazard image:: biohazard.png
680
This is a bit better than alternative 1, but still too much.
682
3. Use a variant of directive syntax, incorporating the substitution
683
text, obviating the need for a special "sub" directive name. If we
684
assume reference alternative 4d (vertical bars), the matching
685
definition would look like this::
687
.. |biohazard| image:: biohazard.png
690
4. (Suggested by Alan Jaffray on Doc-SIG from 2001-11-06.)
692
Instead of adding new syntax, redefine the trailing underscore
693
syntax to mean "substitution reference" instead of "hyperlink
694
reference". Alan's example::
696
I had lunch with Jonathan_ today. We talked about Zope_.
698
.. _Jonathan: lj [user=jhl]
699
.. _Zope: http://www.zope.org/
701
A problem with the proposed syntax is that URIs which look like
702
simple reference names (alphanum plus ".", "-", "_") would be
703
indistinguishable from substitution directive names. A more
704
consistent syntax would be::
706
I had lunch with Jonathan_ today. We talked about Zope_.
708
.. _Jonathan: lj:: user=jhl
709
.. _Zope: http://www.zope.org/
711
(``::`` after ``.. _Jonathan: lj``.)
713
The "Zope" target is a simple external hyperlink, but the
714
"Jonathan" target contains a directive. Alan proposed is that the
715
reference text be replaced by whatever the referenced directive
716
(the "directive target") produces. A directive reference becomes a
717
hyperlink reference if the contents of the directive target resolve
718
to a hyperlink. If the directive target resolves to an icon, the
719
reference is replaced by an inline icon. If the directive target
720
resolves to a hyperlink, the directive reference becomes a
723
This seems too indirect and complicated for easy comprehension.
725
The reference in the text will sometimes become a link, sometimes
726
not. Sometimes the reference text will remain, sometimes not. We
727
don't know *at the reference*::
729
This is a `hyperlink reference`_; its text will remain.
730
This is an `inline icon`_; its text will disappear.
734
The syntax that has been incorporated into the spec and parser is
735
reference alternative 4d with definition alternative 3::
737
The |biohazard| symbol...
739
.. |biohazard| image:: biohazard.png
742
We can also combine substitution references with hyperlink references,
743
by appending a "_" (named hyperlink reference) or "__" (anonymous
744
hyperlink reference) suffix to the substitution reference. This
745
allows us to click on an image-link::
747
The |biohazard|_ symbol...
749
.. |biohazard| image:: biohazard.png
751
.. _biohazard: http://www.cdc.gov/
753
There have been several suggestions for the naming of these
754
constructs, originally called "substitution references" and
757
1. Candidate names for the reference construct:
759
(a) substitution reference
760
(b) tagging reference
761
(c) inline directive reference
762
(d) directive reference
763
(e) indirect inline directive reference
764
(f) inline directive placeholder
765
(g) inline directive insertion reference
766
(h) directive insertion reference
767
(i) insertion reference
768
(j) directive macro reference
770
(l) substitution directive reference
772
2. Candidate names for the definition construct:
775
(b) substitution directive
780
(g) inline directive definition
781
(h) referenced directive
782
(i) indirect directive
783
(j) indirect directive definition
784
(k) directive definition
785
(l) indirect inline directive
786
(m) named directive definition
787
(n) inline directive insertion definition
788
(o) directive insertion definition
789
(p) insertion definition
790
(q) insertion directive
791
(r) substitution definition
792
(s) directive macro definition
794
(u) substitution directive definition
795
(v) substitution definition
797
"Inline directive reference" (1c) seems to be an appropriate term at
798
first, but the term "inline" is redundant in the case of the
799
reference. Its counterpart "inline directive definition" (2g) is
800
awkward, because the directive definition itself is not inline.
802
"Directive reference" (1d) and "directive definition" (2k) are too
803
vague. "Directive definition" could be used to refer to any
804
directive, not just those used for inline substitutions.
806
One meaning of the term "macro" (1k, 2s, 2t) is too
807
programming-language-specific. Also, macros are typically simple text
808
substitution mechanisms: the text is substituted first and evaluated
809
later. reStructuredText substitution definitions are evaluated in
810
place at parse time and substituted afterwards.
812
"Insertion" (1h, 1i, 2n-2q) is almost right, but it implies that
813
something new is getting added rather than one construct being
816
Which brings us back to "substitution". The overall best names are
817
"substitution reference" (1a) and "substitution definition" (2v). A
818
long way to go to add one word!
821
Inline External Targets
822
=======================
824
Currently reStructuredText has two hyperlink syntax variations:
828
This is a named reference_ of one word ("reference"). Here is
829
a `phrase reference`_. Phrase references may even cross `line
832
.. _reference: http://www.example.org/reference/
833
.. _phrase reference: http://www.example.org/phrase_reference/
834
.. _line boundaries: http://www.example.org/line_boundaries/
838
- The plaintext is readable.
839
- Each target may be reused multiple times (e.g., just write
840
``"reference_"`` again).
841
- No syncronized ordering of references and targets is necessary.
845
- The reference text must be repeated as target names; could lead
847
- The target URLs may be located far from the references, and hard
848
to find in the plaintext.
850
* Anonymous hyperlinks (in current reStructuredText)::
852
This is an anonymous reference__. Here is an anonymous
853
`phrase reference`__. Phrase references may even cross `line
856
__ http://www.example.org/reference/
857
__ http://www.example.org/phrase_reference/
858
__ http://www.example.org/line_boundaries/
862
- The plaintext is readable.
863
- The reference text does not have to be repeated.
867
- References and targets must be kept in sync.
868
- Targets cannot be reused.
869
- The target URLs may be located far from the references.
871
For comparison and historical background, StructuredText also has two
872
syntaxes for hyperlinks:
874
* First, ``"reference text":URL``::
876
This is a "reference":http://www.example.org/reference/
877
of one word ("reference"). Here is a "phrase
878
reference":http://www.example.org/phrase_reference/.
880
* Second, ``"reference text", http://example.com/absolute_URL``::
882
This is a "reference", http://www.example.org/reference/
883
of one word ("reference"). Here is a "phrase reference",
884
http://www.example.org/phrase_reference/.
886
Both syntaxes share advantages and disadvantages:
890
- The target is specified immediately adjacent to the reference.
894
- Poor plaintext readability.
895
- Targets cannot be reused.
896
- Both syntaxes use double quotes, common in ordinary text.
897
- In the first syntax, the URL and the last word are stuck
898
together, exacerbating the line wrap problem.
899
- The second syntax is too magical; text could easily be written
900
that way by accident (although only absolute URLs are recognized
901
here, perhaps because of the potential for ambiguity).
903
A new type of "inline external hyperlink" has been proposed.
905
1. On 2002-06-28, Simon Budig proposed__ a new syntax for
906
reStructuredText hyperlinks::
908
This is a reference_(http://www.example.org/reference/) of one
909
word ("reference"). Here is a `phrase
910
reference`_(http://www.example.org/phrase_reference/). Are
911
these examples, (single-underscore), named? If so, `anonymous
912
references`__(http://www.example.org/anonymous/) using two
913
underscores would probably be preferable.
915
__ http://mail.python.org/pipermail/doc-sig/2002-June/002648.html
917
The syntax, advantages, and disadvantages are similar to those of
922
- The target is specified immediately adjacent to the reference.
926
- Poor plaintext readability.
927
- Targets cannot be reused (unless named, but the semantics are
932
- The ``"`ref`_(URL)"`` syntax forces the last word of the
933
reference text to be joined to the URL, making a potentially
934
very long word that can't be wrapped (URLs can be very long).
935
The reference and the URL should be separate. This is a
936
symptom of the following point:
938
- The syntax produces a single compound construct made up of two
939
equally important parts, *with syntax in the middle*, *between*
940
the reference and the target. This is unprecedented in
943
- The "inline hyperlink" text is *not* a named reference (there's
944
no lookup by name), so it shouldn't look like one.
946
- According to the IETF standards RFC 2396 and RFC 2732,
947
parentheses are legal URI characters and curly braces are legal
948
email characters, making their use prohibitively difficult.
950
- The named/anonymous semantics are unclear.
952
2. After an analysis__ of the syntax of (1) above, we came up with the
953
following compromise syntax::
955
This is an anonymous reference__
956
__<http://www.example.org/reference/> of one word
957
("reference"). Here is a `phrase reference`__
958
__<http://www.example.org/phrase_reference/>. `Named
959
references`_ _<http://www.example.org/anonymous/> use single
962
__ http://mail.python.org/pipermail/doc-sig/2002-July/002670.html
964
The syntax builds on that of the existing "inline internal
965
targets": ``an _`inline internal target`.``
969
- The target is specified immediately adjacent to the reference,
970
improving maintainability:
972
- References and targets are easily kept in sync.
973
- The reference text does not have to be repeated.
975
- The construct is executed in two parts: references identical to
976
existing references, and targets that are new but not too big a
977
stretch from current syntax.
979
- There's overwhelming precedent for quoting URLs with angle
984
- Poor plaintext readability.
985
- Lots of "line noise".
986
- Targets cannot be reused (unless named; see below).
988
To alleviate the readability issue slightly, we could allow the
989
target to appear later, such as after the end of the sentence::
991
This is a named reference__ of one word ("reference").
992
__<http://www.example.org/reference/> Here is a `phrase
993
reference`__. __<http://www.example.org/phrase_reference/>
995
Problem: this could only work for one reference at a time
996
(reference/target pairs must be proximate [refA trgA refB trgB],
997
not interleaved [refA refB trgA trgB] or nested [refA refB trgB
998
trgA]). This variation is too problematic; references and inline
999
external targets will have to be kept imediately adjacent (see (3)
1002
The ``"reference__ __<target>"`` syntax is actually for "anonymous
1003
inline external targets", emphasized by the double underscores. It
1004
follows that single trailing and leading underscores would lead to
1005
*implicitly named* inline external targets. This would allow the
1006
reuse of targets by name. So after ``"reference_ _<target>"``,
1007
another ``"reference_"`` would point to the same target.
1010
From RFC 2396 (URI syntax):
1012
The angle-bracket "<" and ">" and double-quote (")
1013
characters are excluded [from URIs] because they are often
1014
used as the delimiters around URI in text documents and
1017
Using <> angle brackets around each URI is especially
1018
recommended as a delimiting style for URI that contain
1021
From RFC 822 (email headers):
1023
Angle brackets ("<" and ">") are generally used to indicate
1024
the presence of a one machine-usable reference (e.g.,
1025
delimiting mailboxes), possibly including source-routing to
1028
3. If it is best for references and inline external targets to be
1029
immediately adjacent, then they might as well be integrated.
1030
Here's an alternative syntax embedding the target URL in the
1033
This is an anonymous `reference <http://www.example.org
1034
/reference/>`__ of one word ("reference"). Here is a `phrase
1035
reference <http://www.example.org/phrase_reference/>`__.
1037
Advantages and disadvantages are similar to those in (2).
1038
Readability is still an issue, but the syntax is a bit less
1039
heavyweight (reduced line noise). Backquotes are required, even
1040
for one-word references; the target URL is included within the
1041
reference text, forcing a phrase context.
1043
We'll call this variant "embedded URIs".
1045
Problem: how to refer to a title like "HTML Anchors: <a>" (which
1046
ends with an HTML/SGML/XML tag)? We could either require more
1047
syntax on the target (like ``"`reference text
1048
__<http://example.com/>`__"``), or require the odd conflicting
1049
title to be escaped (like ``"`HTML Anchors: \<a>`__"``). The
1050
latter seems preferable, and not too onerous.
1052
Similarly to (2) above, a single trailing underscore would convert
1053
the reference & inline external target from anonymous to implicitly
1054
named, allowing reuse of targets by name.
1056
I think this is the least objectionable of the syntax alternatives.
1058
Other syntax variations have been proposed (by Brett Cannon and Benja
1061
`phrase reference`->http://www.example.com
1063
`phrase reference`@http://www.example.com
1065
`phrase reference`__ ->http://www.example.com
1067
`phrase reference` [-> http://www.example.com]
1069
`phrase reference`__ [-> http://www.example.com]
1071
`phrase reference` <http://www.example.com>_
1073
None of these variations are clearly superior to #3 above. Some have
1074
problems that exclude their use.
1076
With any kind of inline external target syntax it comes down to the
1077
conflict between maintainability and plaintext readability. I don't
1078
see a major problem with reStructuredText's maintainability, and I
1079
don't want to sacrifice plaintext readability to "improve" it.
1081
The proponents of inline external targets want them for easily
1082
maintainable web pages. The arguments go something like this:
1084
- Named hyperlinks are difficult to maintain because the reference
1085
text is duplicated as the target name.
1087
To which I said, "So use anonymous hyperlinks."
1089
- Anonymous hyperlinks are difficult to maintain becuase the
1090
references and targets have to be kept in sync.
1092
"So keep the targets close to the references, grouped after each
1093
paragraph. Maintenance is trivial."
1095
- But targets grouped after paragraphs break the flow of text.
1097
"Surely less than URLs embedded in the text! And if the intent is
1098
to produce web pages, not readable plaintext, then who cares about
1101
Many participants have voiced their objections to the proposed syntax:
1103
Garth Kidd: "I strongly prefer the current way of doing it.
1104
Inline is spectactularly messy, IMHO."
1106
Tony Ibbs: "I vehemently agree... that the inline alternatives
1107
being suggested look messy - there are/were good reasons they've
1108
been taken out... I don't believe I would gain from the new
1111
Paul Moore: "I agree as well. The proposed syntax is far too
1112
punctuation-heavy, and any of the alternatives discussed are
1113
ambiguous or too subtle."
1115
Others have voiced their support:
1117
fantasai: "I agree with Simon. In many cases, though certainly
1118
not in all, I find parenthesizing the url in plain text flows
1119
better than relegating it to a footnote."
1121
Ken Manheimer: "I'd like to weigh in requesting some kind of easy,
1122
direct inline reference link."
1124
(Interesting that those *against* the proposal have been using
1125
reStructuredText for a while, and those *for* the proposal are either
1126
new to the list ["fantasai", background unknown] or longtime
1127
StructuredText users [Ken Manheimer].)
1129
I was initially ambivalent/against the proposed "inline external
1130
targets". I value reStructuredText's readability very highly, and
1131
although the proposed syntax offers convenience, I don't know if the
1132
convenience is worth the cost in ugliness. Does the proposed syntax
1133
compromise readability too much, or should the choice be left up to
1134
the author? Perhaps if the syntax is *allowed* but its use strongly
1135
*discouraged*, for aesthetic/readability reasons?
1137
After a great deal of thought and much input from users, I've decided
1138
that there are reasonable use cases for this construct. The
1139
documentation should strongly caution against its use in most
1140
situations, recommending independent block-level targets instead.
1141
Syntax #3 above ("embedded URIs") will be used.
1144
Doctree Representation of Transitions
1145
=====================================
1147
(Although not reStructuredText-specific, this section fits best in
1150
Having added the "horizontal rule" construct to the `reStructuredText
1151
Markup Specification`_, a decision had to be made as to how to reflect
1152
the construct in the implementation of the document tree. Given this
1164
The horizontal rule indicates a "transition" (in prose terms) or the
1165
start of a new "division". Before implementation, the parsed document
1169
<section names="document">
1174
-------- <--- error here
1178
There are several possibilities for the implementation:
1180
1. Implement horizontal rules as "divisions" or segments. A
1181
"division" is a title-less, non-hierarchical section. The first
1182
try at an implementation looked like this::
1185
<section names="document">
1194
But the two paragraphs are really at the same level; they shouldn't
1195
appear to be at different levels. There's really an invisible
1196
"first division". The horizontal rule splits the document body
1197
into two segments, which should be treated uniformly.
1199
2. Treating "divisions" uniformly brings us to the second
1203
<section names="document">
1213
With this change, documents and sections will directly contain
1214
divisions and sections, but not body elements. Only divisions will
1215
directly contain body elements. Even without a horizontal rule
1216
anywhere, the body elements of a document or section would be
1217
contained within a division element. This makes the document tree
1218
deeper. This is similar to the way HTML_ treats document contents:
1219
grouped within a ``<body>`` element.
1221
3. Implement them as "transitions", empty elements::
1224
<section names="document">
1233
A transition would be a "point element", not containing anything,
1234
only identifying a point within the document structure. This keeps
1235
the document tree flatter, but the idea of a "point element" like
1236
"transition" smells bad. A transition isn't a thing itself, it's
1237
the space between two divisions. However, transitions are a
1240
Solution 3 was chosen for incorporation into the document tree model.
1242
.. _HTML: http://www.w3.org/MarkUp/
1245
Syntax for Line Blocks
1246
======================
1248
* An early idea: How about a literal-block-like prefix, perhaps
1249
"``;;``"? (It is, after all, a *semi-literal* literal block, no?)
1252
Take it away, Eric the Orchestra Leader! ;;
1254
A one, two, a one two three four
1256
Half a bee, philosophically,
1257
must, *ipso facto*, half not be.
1258
But half the bee has got to be,
1259
*vis a vis* its entity. D'you see?
1261
But can a bee be said to be
1262
or not to be an entire bee,
1263
when half the bee is not a bee,
1264
due to some ancient injury?
1270
* Another idea: in an ordinary paragraph, if the first line ends with
1271
a backslash (escaping the newline), interpret the entire paragraph
1272
as a verse block? For example::
1274
Add just one backslash\
1275
And this paragraph becomes
1278
(Awful, and arguably invalid, since in Japanese the word "haiku"
1279
contains three syllables not two.)
1281
This idea was superceded by the rules for escaped whitespace, useful
1282
for `character-level inline markup`_.
1284
* In a `2004-02-22 docutils-develop message`__, Jarno Elonen proposed
1285
a "plain list" syntax (and also provided a patch)::
1288
| President, SuperDuper Corp.
1291
__ http://thread.gmane.org/gmane.text.docutils.devel/1187
1293
This syntax is very natural. However, these "plain lists" seem very
1294
similar to line blocks, and I see so little intrinsic "list-ness"
1295
that I'm loathe to add a new object. I used the term "blurbs" to
1296
remove the "list" connotation from the originally proposed name.
1297
Perhaps line blocks could be refined to add the two properties they
1300
A) long lines wrap nicely
1301
B) HTML output doesn't look like program code in non-CSS web
1304
(A) is an issue of all 3 aspects of Docutils: syntax (construct
1305
behaviour), internal representation, and output. (B) is partly an
1306
issue of internal representation but mostly of output.
1308
ReStructuredText will redefine line blocks with the "|"-quoting
1309
syntax. The following is my current thinking.
1315
Perhaps line block syntax like this would do::
1319
| IMF: not decided yet, but probably one of the following:
1325
Note that the "nested" list does not have nested syntax (the "|" are
1326
not further indented); the leading whitespace would still be
1327
significant somehow (more below). As for long lines in the input,
1328
this could suffice::
1331
| Founder, President, Chief Executive Officer, Cook, Bottle
1332
Washer, and All-Round Great Guy
1336
The lack of "|" on the third line indicates that it's a continuation
1337
of the second line, wrapped.
1339
I don't see much point in allowing arbitrary nested content. Multiple
1340
paragraphs or bullet lists inside a "blurb" doesn't make sense to me.
1341
Simple nested line blocks should suffice.
1344
Internal Representation
1345
-----------------------
1347
Line blocks are currently represented as text blobs as follows::
1349
<!ELEMENT line_block %text.model;>
1350
<!ATTLIST line_block
1354
Instead, we could represent each line by a separate element::
1356
<!ELEMENT line_block (line+)>
1357
<!ATTLIST line_block %basic.atts;>
1359
<!ELEMENT line %text.model;>
1360
<!ATTLIST line %basic.atts;>
1362
We'd keep the significance of the leading whitespace of each line
1363
either by converting it to non-breaking spaces at output, or with a
1364
per-line margin. Non-breaking spaces are simpler (for HTML, anyway)
1365
but kludgey, and wouldn't support indented long lines that wrap. But
1366
should inter-word whitespace (i.e., not leading whitespace) be
1367
preserved? Currently it is preserved in line blocks.
1369
Representing a more complex line block may be tricky::
1371
| But can a bee be said to be
1372
| or not to be an entire bee,
1373
| when half the bee is not a bee,
1374
| due to some ancient injury?
1376
Perhaps the representation could allow for nested line blocks::
1378
<!ELEMENT line_block (line | line_block)+>
1380
With this model, leading whitespace would no longer be significant.
1381
Instead, left margins are implied by the nesting. The example above
1382
could be represented as follows::
1386
But can a bee be said to be
1389
or not to be an entire bee,
1392
when half the bee is not a bee,
1395
due to some ancient injury?
1397
I wasn't sure what to do about even more complex line blocks::
1405
How should that be parsed and nested? Should the first line have
1406
the same nesting level (== indentation in the output) as the fourth
1407
line, or the same as the last line? Mark Nodine suggested that such
1408
line blocks be parsed similarly to complexly-nested block quotes,
1409
which seems reasonable. In the example above, this would result in
1410
the nesting of first line matching the last line's nesting. In
1411
other words, the nesting would be relative to neighboring lines
1418
In HTML, line blocks are currently output as "<pre>" blocks, which
1419
gives us significant whitespace and line breaks, but doesn't allow
1420
long lines to wrap and causes monospaced output without stylesheets.
1421
Instead, we could output "<div>" elements parallelling the
1422
representation above, where each nested <div class="line_block"> would
1423
have an increased left margin (specified in the stylesheet).
1425
Jarno suggested the following HTML output::
1427
<div class="line_block">
1428
<span class="line">First, top level line</span><br class="hidden"/>
1429
<div class="line_block"><span class="hidden"> </span>
1430
<span class="line">Second, once nested</span><br class="hidden"/>
1431
<span class="line">Third, once nested</span><br class="hidden"/>
1437
The ``<br class="hidden" />`` and ``<span
1438
class="hidden"> </span>`` are meant to support non-CSS and
1439
non-graphical browsers. I understand the case for "br", but I'm not
1440
so sure about hidden " ". I question how much effort should be
1441
put toward supporting non-graphical and especially non-CSS browsers,
1442
at least for html4css1.py output.
1444
Should the lines themselves be ``<span>`` or ``<div>``? I don't like
1445
mixing inline and block-level elements.
1451
We'll leave the old implementation in place (via the "line-block"
1452
directive only) until all Writers have been updated to support the new
1453
syntax & implementation. The "line-block" directive can then be
1454
updated to use the new internal representation, and its documentation
1455
will be updated to recommend the new syntax.
1461
The original idea came from Dylan Jay:
1463
... to use a two level bulleted list with something to
1464
indicate it should be rendered as a table ...
1466
It's an interesting idea. It could be implemented in as a directive
1467
which transforms a uniform two-level list into a table. Using a
1468
directive would allow the author to explicitly set the table's
1469
orientation (by column or by row), the presence of row headers, etc.
1473
1. (Implemented in Docutils 0.3.8).
1475
Bullet-list-tables might look like this::
1487
- If we took the bones out, it wouldn't be crunchy,
1493
This list must be written in two levels. This wouldn't work::
1510
* If we took the bones out...
1512
The above is a single list of 12 items. The blank lines are not
1513
significant to the markup. We'd have to explicitly specify how
1514
many columns or rows to use, which isn't a good idea.
1516
2. Beni Cherniavsky suggested a field list alternative. It could look
1519
.. field-list-table::
1526
- :treat: Albatross!
1530
- :treat: Crunchy Frog!
1532
:descr: If we took the bones out, it wouldn't be
1533
crunchy, now would it?
1535
Column order is determined from the order of fields in the first
1536
row. Field order in all other rows is ignored. As a side-effect,
1537
this allows trivial re-arrangement of columns. By using named
1538
fields, it becomes possible to omit fields in some rows without
1539
losing track of things, which is important for spans.
1541
3. An alternative to two-level bullet lists would be to use enumerated
1542
lists for the table cells::
1554
3. If we took the bones out, it wouldn't be crunchy,
1557
That provides better correspondence between cells in the same
1558
column than does bullet-list syntax, but not as good as field list
1559
syntax. I think that were only field-list-tables available, a lot
1560
of users would use the equivalent degenerate case::
1562
.. field-list-table::
1568
4. Another natural variant is to allow a description list with field
1569
lists as descriptions::
1582
:descr: If we took the bones out, it wouldn't be
1583
crunchy, now would it?
1585
This would make the whole first column a header column ("stub").
1586
It's limited to a single column and a single paragraph fitting on
1587
one source line. Also it wouldn't allow for empty cells or row
1588
spans in the first column. But these are limitations that we could
1589
live with, like those of simple tables.
1591
The List-driven table feature could be done in many ways. Each user
1592
will have their preferred usage. Perhaps a single "list-table"
1593
directive could handle them all, depending on which options and
1594
content are present.
1598
* How to indicate that there's 1 header row? Perhaps two lists? ::
1610
This is probably too subtle though. Better would be a directive
1611
option, like ``:headrows: 1``. An early suggestion for the header
1612
row(s) was to use a directive option::
1614
.. field-list-table::
1619
- :treat: Albatross!
1623
But the table data is at two levels and looks inconsistent.
1625
In general, we cannot extract the header row from field lists' field
1626
names because field names cannot contain everything one might put in
1627
a table cell. A separate header row also allows shorter field names
1628
and doesn't force one to rewrite the whole table when the header
1629
text changes. But for simpler cases, we can offer a ":header:
1630
fields" option, which does extract header cells from field names::
1632
.. field-list-table::
1635
- :Treat: Albatross!
1637
:Description: On a stick!
1639
* How to indicate the column widths? A directive option? ::
1644
Automatic defaults from the text used?
1646
* How to handle row and/or column spans?
1648
In a field list, column-spans can be indicated by specifying the
1649
first and last fields, separated by space-dash-space or ellipsis::
1652
- :foo ... baz: quuux
1654
Commas were proposed for column spans::
1658
But non-adjacent columns become problematic. Should we report an
1659
error, or duplicate the value into each span of adjacent columns (as
1660
was suggested)? The latter suggestion is appealing but may be too
1661
clever. Best perhaps to simply specify the two ends.
1663
It was suggested that comma syntax should be allowed, too, in order
1664
to allow the user to avoid trouble when changing the column order.
1665
But changing the column order of a table with spans is not trivial;
1666
we shouldn't make it easier to mess up.
1668
One possible syntax for row-spans is to simply treat any row where a
1669
field is missing as a row-span from the last row where it appeared.
1670
Leaving a field empty would still be possible by writing a field
1671
with empty content. But this is too implicit.
1673
Another way would be to require an explicit continuation marker
1674
(``...``/``-"-``/``"``?) in all but the first row of a spanned
1675
field. Empty comments could work (".."). If implemented, the same
1676
marker could also be supported in simple tables, which lack
1677
row-spanning abilities.
1679
Explicit markup like ":rowspan:" and ":colspan:" was also suggested.
1681
Sometimes in a table, the first header row contains spans. It may
1682
be necessary to provide a way to specify the column field names
1683
independently of data rows. A directive option would do it.
1685
* We could specify "column-wise" or "row-wise" ordering, with the same
1686
markup structure. For example, with definition data::
1699
- If we took the bones out, it wouldn't be
1700
crunchy, now would it?
1702
* A syntax for _`stubs in grid tables` is easy to imagine::
1704
+------------------------++------------+----------+
1705
| Header row, column 1 || Header 2 | Header 3 |
1706
+========================++============+==========+
1707
| body row 1, column 1 || column 2 | column 3 |
1708
+------------------------++------------+----------+
1710
Or this idea from Nick Moffitt::
1721
Auto-Enumerated Lists
1722
=====================
1724
Implemented 2005-03-24: combination of variation 1 & 2.
1726
The advantage of auto-numbered enumerated lists would be similar to
1727
that of auto-numbered footnotes: lists could be written and rearranged
1728
without having to manually renumber them. The disadvantages are also
1729
the same: input and output wouldn't match exactly; the markup may be
1730
ugly or confusing (depending on which alternative is chosen).
1732
1. Use the "#" symbol. Example::
1738
Advantages: simple, explicit. Disadvantage: enumeration sequence
1739
cannot be specified (limited to arabic numerals); ugly.
1741
2. As a variation on #1, first initialize the enumeration sequence?
1748
Advantages: simple, explicit, any enumeration sequence possible.
1749
Disadvantages: ugly; perhaps confusing with mixed concrete/abstract
1752
3. Alternative suggested by Fred Bremmer, from experience with MoinMoin::
1758
Advantages: enumeration sequence is explicit (could be multiple
1759
"a." or "(I)" tokens). Disadvantages: perhaps confusing; otherwise
1760
erroneous input (e.g., a duplicate item "1.") would pass silently,
1761
either causing a problem later in the list (if no blank lines
1762
between items) or creating two lists (with blanks).
1764
Take this input for example::
1768
1. Unintentional duplicate of item 1.
1772
Currently the parser will produce two list, "1" and "1,2" (no
1773
warnings, because of the presence of blank lines). Using Fred's
1774
notation, the current behavior is "1,1,2 -> 1 1,2" (without blank
1775
lines between items, it would be "1,1,2 -> 1 [WARNING] 1,2"). What
1776
should the behavior be with auto-numbering?
1778
Fred has produced a patch__, whose initial behavior is as follows::
1783
1,2,2,3 -> 1,2,3 [WARNING] 3
1784
1,1,2 -> 1,2 [WARNING] 2
1786
(After the "[WARNING]", the "3" would begin a new list.)
1788
I have mixed feelings about adding this functionality to the spec &
1789
parser. It would certainly be useful to some users (myself
1790
included; I often have to renumber lists). Perhaps it's too
1791
clever, asking the parser to guess too much. What if you *do* want
1792
three one-item lists in a row, each beginning with "1."? You'd
1793
have to use empty comments to force breaks. Also, I question
1794
whether "1,2,2 -> 1,2,3" is optimal behavior.
1796
In response, Fred came up with "a stricter and more explicit rule
1797
[which] would be to only auto-number silently if *all* the
1798
enumerators of a list were identical". In that case::
1801
1,2,2 -> 1,2 [WARNING] 2
1803
1,2,2,3 -> 1,2 [WARNING] 2,3
1804
1,1,2 -> 1,2 [WARNING] 2
1806
Should any start-value be allowed ("3,3,3"), or should
1807
auto-numbered lists be limited to begin with ordinal-1 ("1", "A",
1810
__ http://sourceforge.net/tracker/index.php?func=detail&aid=548802
1811
&group_id=38414&atid=422032
1813
4. Alternative proposed by Tony Ibbs::
1816
#3. Aha - I edited this in later.
1819
The initial proposal required unique enumerators within a list, but
1820
this limits the convenience of a feature of already limited
1821
applicability and convenience. Not a useful requirement; dropped.
1823
Instead, simply prepend a "#" to a standard list enumerator to
1824
indicate auto-enumeration. The numbers (or letters) of the
1825
enumerators themselves are not significant, except:
1827
- as a sequence indicator (arabic, roman, alphabetic; upper/lower),
1829
- and perhaps as a start value (first list item).
1831
Advantages: explicit, any enumeration sequence possible.
1832
Disadvantages: a bit ugly.
1842
As a further wrinkle (see `Reworking Explicit Markup (Round 1)`_
1843
above), in the wee hours of 2002-02-28 I posted several ideas for
1844
changes to footnote syntax:
1846
- Change footnote syntax from ``.. [1]`` to ``_[1]``? ...
1847
- Differentiate (with new DTD elements) author-date "citations"
1848
(``[GVR2002]``) from numbered footnotes? ...
1849
- Render footnote references as superscripts without "[]"? ...
1851
These ideas are all related, and suggest changes in the
1852
reStructuredText syntax as well as the docutils tree model.
1854
The footnote has been used for both true footnotes (asides expanding
1855
on points or defining terms) and for citations (references to external
1856
works). Rather than dealing with one amalgam construct, we could
1857
separate the current footnote concept into strict footnotes and
1858
citations. Citations could be interpreted and treated differently
1859
from footnotes. Footnotes would be limited to numerical labels:
1860
manual ("1") and auto-numbered (anonymous "#", named "#label").
1862
The footnote is the only explicit markup construct (starts with ".. ")
1863
that directly translates to a visible body element. I've always been
1864
a little bit uncomfortable with the ".. " marker for footnotes because
1865
of this; ".. " has a connotation of "special", but footnotes aren't
1866
especially "special". Printed texts often put footnotes at the bottom
1867
of the page where the reference occurs (thus "foot note"). Some HTML
1868
designs would leave footnotes to be rendered the same positions where
1869
they're defined. Other online and printed designs will gather
1870
footnotes into a section near the end of the document, converting them
1871
to "endnotes" (perhaps using a directive in our case); but this
1872
"special processing" is not an intrinsic property of the footnote
1873
itself, but a decision made by the document author or processing
1876
Citations are almost invariably collected in a section at the end of a
1877
document or section. Citations "disappear" from where they are
1878
defined and are magically reinserted at some well-defined point.
1879
There's more of a connection to the "special" connotation of the ".. "
1880
syntax. The point at which the list of citations is inserted could be
1881
defined manually by a directive (e.g., ".. citations::"), and/or have
1882
default behavior (e.g., a section automatically inserted at the end of
1883
the document) that might be influenced by options to the Writer.
1892
.. [#] Auto-numbered footnote.
1893
.. [#label] Auto-labeled footnote.
1895
- The syntax proposed in the original 2002-02-28 Doc-SIG post:
1896
remove the ".. ", prefix a "_"::
1899
_[#] Auto-numbered footnote.
1900
_[#label] Auto-labeled footnote.
1902
The leading underscore syntax (earlier dropped because
1903
``.. _[1]:`` was too verbose) is a useful reminder that footnotes
1904
are hyperlink targets.
1906
- Minimal syntax: remove the ".. [" and "]", prefix a "_", and
1910
_#. Auto-numbered footnote.
1911
_#label. Auto-labeled footnote.
1913
``_1.``, ``_#.``, and ``_#label.`` are markers,
1916
Footnotes could be rendered something like this in HTML
1918
| 1. This is a footnote. The brackets could be dropped
1919
| from the label, and a vertical bar could set them
1920
| off from the rest of the document in the HTML.
1922
Two-way hyperlinks on the footnote marker ("1." above) would also
1923
help to differentiate footnotes from enumerated lists.
1925
If converted to endnotes (by a directive/transform), a horizontal
1926
half-line might be used instead. Page-oriented output formats
1927
would typically use the horizontal line for true footnotes.
1929
+ Footnote references:
1933
[1]_, [#]_, [#label]_
1935
- Minimal syntax to match the minimal footnote syntax above::
1939
As a consequence, pure-numeric hyperlink references would not be
1940
possible; they'd be interpreted as footnote references.
1942
+ Citation references: no change is proposed from the current footnote
1949
- Current syntax (footnote syntax)::
1951
.. [GVR2001] Python Documentation; van Rossum, Drake, et al.;
1952
http://www.python.org/doc/
1954
- Possible new syntax::
1956
_[GVR2001] Python Documentation; van Rossum, Drake, et al.;
1957
http://www.python.org/doc/
1960
Docutils: Python Documentation Utilities project; Goodger
1961
et al.; http://docutils.sourceforge.net/
1963
Without the ".. " marker, subsequent lines would either have to
1964
align as in one of the above, or we'd have to allow loose
1965
alignment (I'd rather not)::
1967
_[GVR2001] Python Documentation; van Rossum, Drake, et al.;
1968
http://www.python.org/doc/
1970
I proposed adopting the "minimal" syntax for footnotes and footnote
1971
references, and adding citations and citation references to
1972
reStructuredText's repertoire. The current footnote syntax for
1973
citations is better than the alternatives given.
1975
From a reply by Tony Ibbs on 2002-03-01:
1977
However, I think easier with examples, so let's create one::
1979
Fans of Terry Pratchett are perhaps more likely to use
1980
footnotes [1]_ in their own writings than other people
1981
[2]_. Of course, in *general*, one only sees footnotes
1982
in academic or technical writing - it's use in fiction
1983
and letter writing is not normally considered good
1984
style [4]_, particularly in emails (not a medium that
1985
lends itself to footnotes).
1987
.. [1] That is, little bits of referenced text at the
1989
.. [2] Because Terry himself does, of course [3]_.
1990
.. [3] Although he has the distinction of being
1991
*funny* when he does it, and his fans don't always
1993
.. [4] Presumably because it detracts from linear
1994
reading of the text - this is, of course, the point.
1996
and look at it with the second syntax proposal::
1998
Fans of Terry Pratchett are perhaps more likely to use
1999
footnotes [1]_ in their own writings than other people
2000
[2]_. Of course, in *general*, one only sees footnotes
2001
in academic or technical writing - it's use in fiction
2002
and letter writing is not normally considered good
2003
style [4]_, particularly in emails (not a medium that
2004
lends itself to footnotes).
2006
_[1] That is, little bits of referenced text at the
2008
_[2] Because Terry himself does, of course [3]_.
2009
_[3] Although he has the distinction of being
2010
*funny* when he does it, and his fans don't always
2012
_[4] Presumably because it detracts from linear
2013
reading of the text - this is, of course, the point.
2015
(I note here that if I have gotten the indentation of the
2016
footnotes themselves correct, this is clearly not as nice. And if
2017
the indentation should be to the left margin instead, I like that
2020
and the third (new) proposal::
2022
Fans of Terry Pratchett are perhaps more likely to use
2023
footnotes 1_ in their own writings than other people
2024
2_. Of course, in *general*, one only sees footnotes
2025
in academic or technical writing - it's use in fiction
2026
and letter writing is not normally considered good
2027
style 4_, particularly in emails (not a medium that
2028
lends itself to footnotes).
2030
_1. That is, little bits of referenced text at the
2032
_2. Because Terry himself does, of course 3_.
2033
_3. Although he has the distinction of being
2034
*funny* when he does it, and his fans don't always
2036
_4. Presumably because it detracts from linear
2037
reading of the text - this is, of course, the point.
2039
I think I don't, in practice, mind the targets too much (the use
2040
of a dot after the number helps a lot here), but I do have a
2041
problem with the body text, in that I don't naturally separate out
2042
the footnotes as different than the rest of the text - instead I
2043
keep wondering why there are numbers interspered in the text. The
2044
use of brackets around the numbers ([ and ]) made me somehow parse
2045
the footnote references as "odd" - i.e., not part of the body text
2046
- and thus both easier to skip, and also (paradoxically) easier to
2047
pick out so that I could follow them.
2049
Thus, for the moment (and as always susceptable to argument), I'd
2050
say -1 on the new form of footnote reference (i.e., I much prefer
2051
the existing ``[1]_`` over the proposed ``1_``), and ambivalent
2052
over the proposed target change.
2054
That leaves David's problem of wanting to distinguish footnotes
2055
and citations - and the only thing I can propose there is that
2056
footnotes are numeric or # and citations are not (which, as a
2057
human being, I can probably cope with!).
2059
From a reply by Paul Moore on 2002-03-01:
2061
I think the current footnote syntax ``[1]_`` is *exactly* the
2062
right balance of distinctness vs unobtrusiveness. I very
2063
definitely don't think this should change.
2065
On the target change, it doesn't matter much to me.
2067
From a further reply by Tony Ibbs on 2002-03-01, referring to the
2068
"[1]" form and actual usage in email:
2070
Clearly this is a form people are used to, and thus we should
2071
consider it strongly (in the same way that the usage of ``*..*``
2072
to mean emphasis was taken partly from email practise).
2074
Equally clearly, there is something "magical" for people in the
2075
use of a similar form (i.e., ``[1]``) for both footnote reference
2076
and footnote target - it seems natural to keep them similar.
2080
I think that this established plaintext usage leads me to strongly
2081
believe we should retain square brackets at both ends of a
2082
footnote. The markup of the reference end (a single trailing
2083
underscore) seems about as minimal as we can get away with. The
2084
markup of the target end depends on how one envisages the thing -
2085
if ".." means "I am a target" (as I tend to see it), then that's
2086
good, but one can also argue that the "_[1]" syntax has a neat
2087
symmetry with the footnote reference itself, if one wishes (in
2088
which case ".." presumably means "hidden/special" as David seems
2089
to think, which is why one needs a ".." *and* a leading underline
2090
for hyperlink targets.
2092
Given the persuading arguments voiced, we'll leave footnote & footnote
2093
reference syntax alone. Except that these discussions gave rise to
2094
the "auto-symbol footnote" concept, which has been added. Citations
2095
and citation references have also been added.
2098
Syntax for Questions & Answers
2099
==============================
2101
Implement as a generic two-column marked list? As a standalone
2102
(non-directive) construct? (Is the markup ambiguous?) Add support to
2105
New elements would be required. Perhaps::
2107
<!ELEMENT question_list (question_list_item+)>
2108
<!ATTLIST question_list
2109
numbering (none | local | global)
2111
start NUMBER #IMPLIED>
2112
<!ELEMENT question_list_item (question, answer*)>
2113
<!ELEMENT question %text.model;>
2114
<!ELEMENT answer (%body.elements;)+>
2116
Originally I thought of implementing a Q&A list with special syntax::
2120
A: You are a question-and-answer
2125
A: I am the omniscient "we".
2127
Where each "Q" and "A" could also be numbered (e.g., "Q1"). However,
2128
a simple enumerated or bulleted list will do just fine for syntax. A
2129
directive could treat the list specially; e.g. the first paragraph
2130
could be treated as a question, the remainder as the answer (multiple
2131
answers could be represented by nested lists). Without special
2132
syntax, this directive becomes low priority.
2134
As described in the FAQ__, no special syntax or directive is needed
2135
for this application.
2137
__ http://docutils.sf.net/FAQ.html
2138
#how-can-i-mark-up-a-faq-or-other-list-of-questions-answers
2145
Reworking Explicit Markup (Round 2)
2146
===================================
2148
See `Reworking Explicit Markup (Round 1)`_ for an earlier discussion.
2150
In April 2004, a new thread becan on docutils-develop: `Inconsistency
2151
in RST markup`__. Several arguments were made; the first argument
2152
begat later arguments. Below, the arguments are paraphrased "in
2153
quotes", with responses.
2155
__ http://thread.gmane.org/gmane.text.docutils.devel/1386
2157
1. References and targets take this form::
2161
.. _targetname: stuff
2163
But footnotes, "which generate links just like targets do", are
2170
"Footnotes should be written as"::
2176
But they're not the same type of animal. That's not a "footnote
2177
target", it's a *footnote*. Being a target is not a footnote's
2178
primary purpose (an arguable point). It just happens to grow a
2179
target automatically, for convenience. Just as a section title::
2184
isn't a "title target", it's a *title*, which happens to grow a
2185
target automatically. The consistency is there, it's just deeper
2186
than at first glance.
2188
Also, ".. [1]" was chosen for footnote syntax because it closely
2189
resembles one form of actual footnote rendering. ".. _[1]:" is too
2190
verbose; excessive punctuation is required to get the job done.
2192
For more of the reasoning behind the syntax, see `Problems With
2193
StructuredText (Hyperlinks) <problems.html#hyperlinks>`__ and
2194
`Reworking Footnotes`_.
2196
2. "I expect directives to also look like ``.. this:`` [one colon]
2197
because that also closely parallels the link and footnote target
2200
There are good reasons for the two-colon syntax:
2202
Two colons are used after the directive type for these reasons:
2204
- Two colons are distinctive, and unlikely to be used in common
2207
- Two colons avoids clashes with common comment text like::
2209
.. Danger: modify at your own risk!
2211
- If an implementation of reStructuredText does not recognize a
2212
directive (i.e., the directive-handler is not installed), a
2213
level-3 (error) system message is generated, and the entire
2214
directive block (including the directive itself) will be
2215
included as a literal block. Thus "::" is a natural choice.
2217
-- `restructuredtext.html#directives
2218
<../../ref/rst/restructuredtext.html#directives>`__
2220
The last reason is not particularly compelling; it's more of a
2221
convenient coincidence or mnemonic.
2223
3. "Comments always seemed too easy. I almost never write comments.
2224
I'd have no problem writing '.. comment:' in front of my comments.
2225
In fact, it would probably be more readable, as comments *should*
2226
be set off strongly, because they are very different from normal
2229
Many people do use comments though, and some applications of
2230
reStructuredText require it. For example, all reStructuredText
2231
PEPs (and this document!) have an Emacs stanza at the bottom, in a
2232
comment. Having to write ".. comment::" would be very obtrusive.
2234
Comments *should* be dirt-easy to do. It should be easy to
2235
"comment out" a block of text. Comments in programming languages
2236
and other markup languages are invariably easy.
2238
Any author is welcome to preface their comments with "Comment:" or
2239
"Do Not Print" or "Note to Editor" or anything they like. A
2240
"comment" directive could easily be implemented. It might be
2241
confused with admonition directives, like "note" and "caution"
2242
though. In unrelated (and unpublished and unfinished) work, adding
2243
a "comment" directive as a true document element was considered::
2245
If structure is necessary, we could use a "comment" directive
2246
(to avoid nonsensical DTD changes, the "comment" directive
2247
could produce an untitled topic element).
2249
4. "One of the goals of reStructuredText is to be *readable* by people
2250
who don't know it. This construction violates that: it is not at
2251
all obvious to the uninitiated that text marked by '..' is a
2252
comment. On the other hand, '.. comment:' would be totally
2255
Totally transparent, perhaps, but also very obtrusive. Another of
2256
`reStructuredText's goals`_ is to be unobtrusive, and
2257
".. comment::" would violate that. The goals of reStructuredText
2258
are many, and they conflict. Determining the right set of goals
2259
and finding solutions that best fit is done on a case-by-case
2262
Even readability is has two aspects. Being readable without any
2263
prior knowledge is one. Being as easily read in raw form as in
2264
processed form is the other. ".." may not contribute to the former
2265
aspect, but ".. comment::" would certainly detract from the latter.
2268
.. _reStructuredText's goals: ../../ref/rst/introduction.html#goals
2270
5. "Recently I sent someone an rst document, and they got confused; I
2271
had to explain to them that '..' marks comments, *unless* it's a
2274
The explanation of directives *is* roundabout, defining comments in
2275
terms of not being other things. That's definitely a wart.
2277
6. "Under the current system, a mistyped directive (with ':' instead
2278
of '::') will be silently ignored. This is an error that could
2279
easily go unnoticed."
2281
A parser option/setting like "--comments-on-stderr" would help.
2283
7. "I'd prefer to see double-dot-space / command / double-colon as the
2284
standard Docutils markup-marker. It's unusual enough to avoid
2285
being accidently used. Everything that starts with a double-dot
2286
should end with a double-colon."
2288
That would increase the punctuation verbosity of some constructs
2291
8. Edward Loper proposed the following plan for backwards
2294
1. ".. foo" will generate a deprecation warning to stderr, and
2295
nothing in the output (no system messages).
2296
2. ".. foo: bar" will be treated as a directive foo. If there
2297
is no foo directive, then do the normal error output.
2298
3. ".. foo:: bar" will generate a deprecation warning to
2299
stderr, and be treated as a directive. Or leave it valid?
2301
So some existing documents might start printing deprecation
2302
warnings, but the only existing documents that would *break*
2303
would be ones that say something like::
2305
.. warning: this should be a comment
2309
.. warning:: this should be a comment
2311
Here, we're trading fairly common a silent error (directive
2312
falsely treated as a comment) for a fairly uncommon explicitly
2313
flagged error (comment falsely treated as directive). To make
2314
things even easier, we could add a sentence to the
2315
unknown-directive error. Something like "If you intended to
2316
create a comment, please use '.. comment:' instead".
2318
On one hand, I understand and sympathize with the points raised. On
2319
the other hand, I think the current syntax strikes the right balance
2320
(but I acknowledge a possible lack of objectivity). On the gripping
2321
hand, the comment and directive syntax has become well established, so
2322
even if it's a wart, it may be a wart we have to live with.
2324
Making any of these changes would cause a lot of breakage or at least
2325
deprecation warnings. I'm not sure the benefit is worth the cost.
2327
For now, we'll treat this as an unresolved legacy issue.
2334
Nested Inline Markup
2335
====================
2337
These are collected notes on a long-discussed issue. The original
2338
mailing list messages should be referred to for details.
2340
* In a 2001-10-31 discussion I wrote:
2342
Try, for example, `Ed Loper's 2001-03-21 post`_, which details
2343
some rules for nested inline markup. I think the complexity is
2344
prohibitive for the marginal benefit. (And if you can understand
2345
that tree without going mad, you're a better man than I. ;-)
2347
Inline markup is already fragile. Allowing nested inline markup
2348
would only be asking for trouble IMHO. If it proves absolutely
2349
necessary, it can be added later. The rules for what can appear
2350
inside what must be well thought out first though.
2352
.. _Ed Loper's 2001-03-21 post:
2353
http://mail.python.org/pipermail/doc-sig/2001-March/001487.html
2355
-- http://mail.python.org/pipermail/doc-sig/2001-October/002354.html
2357
* In a 2001-11-09 Doc-SIG post, I wrote:
2359
The problem is that in the
2360
what-you-see-is-more-or-less-what-you-get markup language that
2361
is reStructuredText, the symbols used for inline markup ("*",
2362
"**", "`", "``", etc.) may preclude nesting.
2364
I've rethought this position. Nested markup is not precluded, just
2365
tricky. People and software parse "double and 'single' quotes" all
2366
the time. Continuing,
2368
I've thought over how we might implement nested inline
2369
markup. The first algorithm ("first identify the outer inline
2370
markup as we do now, then recursively scan for nested inline
2371
markup") won't work; counterexamples were given in my `last post
2372
<http://mail.python.org/pipermail/doc-sig/2001-November/002363.html>`__.
2374
The second algorithm makes my head hurt::
2377
scan for start-string
2380
scan for start or end string
2381
if new start string found:
2383
elif matching end string found:
2385
elif non-matching end string found:
2386
if its a markup error:
2388
elif the initial start-string was misinterpreted:
2389
# e.g. in this case: ***strong** in emphasis*
2390
restart with the other interpretation
2391
# but it might be several layers back ...
2394
This is similar to how the parser does section title
2395
recognition, but sections are much more regular and
2398
Bottom line is, I don't think the benefits are worth the effort,
2399
even if it is possible. I'm not going to try to write the code,
2400
at least not now. If somebody codes up a consistent, working,
2401
general solution, I'll be happy to consider it.
2403
-- http://mail.python.org/pipermail/doc-sig/2001-November/002388.html
2405
* In a `2003-05-06 Docutils-Users post`__ Paul Tremblay proposed a new
2406
syntax to allow for easier nesting. It eventually evolved into
2411
The duplication with the existing interpreted text syntax is
2414
__ http://article.gmane.org/gmane.text.docutils.user/317
2416
* Could the parser be extended to parse nested interpreted text? ::
2418
:emphasis:`Some emphasized text with :strong:`some more
2419
emphasized text` in it and **perhaps** :reference:`a link``
2421
* In a `2003-06-18 Docutils-Develop post`__, Mark Nodine reported on
2422
his implementation of a form of nested inline markup in his
2423
Perl-based parser (unpublished). He brought up some interesting
2424
ideas. The implementation was flawed, however, by the change in
2425
semantics required for backslash escapes.
2427
__ http://article.gmane.org/gmane.text.docutils.devel/795
2429
* Docutils-develop threads between David Abrahams, David Goodger, and
2430
Mark Nodine (beginning 2004-01-16__ and 2004-01-19__) hashed out
2431
many of the details of a potentially successful implementation, as
2432
described below. David Abrahams checked in code to the "nesting"
2433
branch of CVS, awaiting thorough review.
2435
__ http://thread.gmane.org/gmane.text.docutils.devel/1102
2436
__ http://thread.gmane.org/gmane.text.docutils.devel/1125
2438
It may be possible to accomplish nested inline markup in general with
2439
a more powerful inline markup parser. There may be some issues, but
2440
I'm not averse to the idea of nested inline markup in general. I just
2441
don't have the time or inclination to write a new parser now. Of
2442
course, a good patch would be welcome!
2444
I envisage something like this. Explicit-role interpreted text must
2445
be nestable. Prefix-based is probably preferred, since suffix-based
2446
will look like inline literals::
2448
``text`:role1:`:role2:
2450
But it can be disambiguated, so it ought to be left up to the author::
2452
`\ `text`:role1:`:role2:
2454
In addition, other forms of inline markup may be nested if
2457
*emphasized ``literal`` and |substitution ref| and link_*
2459
IOW, the parser ought to be as permissive as possible.
2462
Index Entries & Indexes
2463
=======================
2465
Were I writing a book with an index, I guess I'd need two
2466
different kinds of index targets: inline/implicit and
2467
out-of-line/explicit. For example::
2469
In this `paragraph`:index:, several words are being
2470
`marked`:index: inline as implicit `index`:index:
2476
The explicit index directives above would refer to
2477
this paragraph. It might also make sense to allow multiple
2478
entries in an ``index`` directive:
2484
The words "paragraph", "marked", and "index" would become index
2485
entries pointing at the words in the first paragraph. The index
2486
entry words appear verbatim in the text. (Don't worry about the
2487
ugly ":index:" part; if indexing is the only/main application of
2488
interpreted text in your documents, it can be implicit and
2489
omitted.) The two directives provide manual indexing, where the
2490
index entry words ("markup" and "syntax") do not appear in the
2491
main text. We could combine the two directives into one::
2493
.. index:: markup; syntax
2495
Semicolons instead of commas because commas could *be* part of the
2496
index target, like::
2498
.. index:: van Rossum, Guido
2500
Another reason for index directives is because other inline markup
2501
wouldn't be possible within inline index targets.
2503
Sometimes index entries have multiple levels. Given::
2505
.. index:: statement syntax: expression statements
2507
In a hypothetical index, combined with other entries, it might
2511
expression statements ..... 56
2512
assignment ................ 57
2513
simple statements ......... 58
2514
compound statements ....... 60
2516
Inline multi-level index targets could be done too. Perhaps
2519
When dealing with `expression statements <statement syntax:>`,
2520
we must remember ...
2522
The opposite sense could also be possible::
2524
When dealing with `index entries <:multi-level>`, there are
2525
many permutations to consider.
2527
Also "see / see also" index entries.
2533
.. index:: paragraph
2535
(The "index" directive above actually targets the *preceding*
2536
object.) The directive should produce something like this XML::
2539
<index_entry text="paragraph"/>
2543
This kind of content model would also allow true inline
2546
Here's a `paragraph`:index:.
2548
If the "index" role were the default for the application, it could be
2551
Here's a `paragraph`.
2553
Both of these would result in this XML::
2556
Here's a <index_entry>paragraph</index_entry>.
2560
from 2002-06-24 docutils-develop posts
2561
--------------------------------------
2563
If all of your index entries will appear verbatim in the text,
2564
this should be sufficient. If not (e.g., if you want "Van Rossum,
2565
Guido" in the index but "Guido van Rossum" in the text), we'll
2566
have to figure out a supplemental mechanism, perhaps using
2569
I've thought a bit more on this, and I came up with two possibilities:
2571
1. Using interpreted text, embed the index entry text within the
2574
... by `Guido van Rossum [Van Rossum, Guido]` ...
2576
The problem with this is obvious: the text becomes cluttered and
2577
hard to read. The processed output would drop the text in
2578
brackets, which goes against the spirit of interpreted text.
2580
2. Use substitutions::
2582
... by |Guido van Rossum| ...
2584
.. |Guido van Rossum| index:: Van Rossum, Guido
2586
A problem with this is that each substitution definition must have
2587
a unique name. A subsequent ``.. |Guido van Rossum| index:: BDFL``
2588
would be illegal. Some kind of anonymous substitution definition
2589
mechanism would be required, but I think that's going too far.
2591
Both of these alternatives are flawed. Any other ideas?
2598
This is the realm of the possible but questionably probable. These
2599
ideas are kept here as a record of what has been proposed, for
2600
posterity and in case any of them prove to be useful.
2603
Compound Enumerated Lists
2604
=========================
2606
Allow for compound enumerators, such as "1.1." or "1.a." or "1(a)", to
2607
allow for nested enumerated lists without indentation?
2613
Allow for variant styles by interpreting indented lists as if they
2614
weren't indented? For example, currently the list below will be
2615
parsed as a list within a block quote::
2622
But a lot of people seem to write that way, and HTML browsers make it
2623
look as if that's the way it should be. The parser could check the
2624
contents of block quotes, and if they contain only a single list,
2625
remove the block quote wrapper. There would be two problems:
2627
1. What if we actually *do* want a list inside a block quote?
2629
2. What if such a list comes immediately after an indented construct,
2630
such as a literal block?
2632
Both could be solved using empty comments (problem 2 already exists
2633
for a block quote after a literal block). But that's a hack.
2635
Perhaps a runtime setting, allowing or disabling this convenience,
2636
would be appropriate. But that raises issues too:
2638
User A, who writes lists indented (and their config file is set up
2639
to allow it), sends a file to user B, who doesn't (and their
2640
config file disables indented lists). The result of processing by
2641
the two users will be different.
2643
It may seem minor, but it adds ambiguity to the parser, which is bad.
2645
See the `Doc-SIG discussion starting 2001-04-18`__ with Ed Loper's
2646
"Structuring: a summary; and an attempt at EBNF", item 4 (and
2647
follow-ups, here__ and here__). Also `docutils-users, 2003-02-17`__
2648
and `beginning 2003-08-04`__.
2650
__ http://mail.python.org/pipermail/doc-sig/2001-April/001776.html
2651
__ http://mail.python.org/pipermail/doc-sig/2001-April/001789.html
2652
__ http://mail.python.org/pipermail/doc-sig/2001-April/001793.html
2653
__ http://sourceforge.net/mailarchive/message.php?msg_id=3838913
2654
__ http://sf.net/mailarchive/forum.php?thread_id=2957175&forum_id=11444
2657
Sloppy Indentation of List Items
2658
================================
2660
Perhaps the indentation shouldn't be so strict. Currently, this is
2666
Anything wrong with this? ::
2675
Block quote. (no good: requires some indent relative to first
2680
2. Have to carefully define where the literal block ends::
2686
Hmm... Non-strict indentation isn't such a good idea.
2689
Lazy Indentation of List Items
2690
==============================
2692
Another approach: Going back to the first draft of reStructuredText
2693
(2000-11-27 post to Doc-SIG)::
2695
- This is the fourth item of the main list (no blank line above).
2696
The second line of this item is not indented relative to the
2697
bullet, which precludes it from having a second paragraph.
2699
Change that to *require* a blank line above and below, to reduce
2700
ambiguity. This "loosening" may be added later, once the parser's
2701
been nailed down. However, a serious drawback of this approach is to
2702
limit the content of each list item to a single paragraph.
2705
David's Idea for Lazy Indentation
2706
---------------------------------
2708
Consider a paragraph in a word processor. It is a single logical line
2709
of text which ends with a newline, soft-wrapped arbitrarily at the
2710
right edge of the page or screen. We can think of a plaintext
2711
paragraph in the same way, as a single logical line of text, ending
2712
with two newlines (a blank line) instead of one, and which may contain
2713
arbitrary line breaks (newlines) where it was accidentally
2714
hard-wrapped by an application. We can compensate for the accidental
2715
hard-wrapping by "unwrapping" every unindented second and subsequent
2716
line. The indentation of the first line of a paragraph or list item
2717
would determine the indentation for the entire element. Blank lines
2718
would be required between list items when using lazy indentation.
2720
The following example shows the lazy indentation of multiple body
2723
- This is the first paragraph
2724
of the first list item.
2726
Here is the second paragraph
2727
of the first list item.
2729
- This is the first paragraph
2730
of the second list item.
2732
Here is the second paragraph
2733
of the second list item.
2735
A more complex example shows the limitations of lazy indentation::
2737
- This is the first paragraph
2738
of the first list item.
2740
Next is a definition list item:
2743
Definition. The indentation of the term is
2744
required, as is the indentation of the definition's
2747
When the definition extends to more than
2748
one line, lazy indentation may occur. (This is the second
2749
paragraph of the definition.)
2751
- This is the first paragraph
2752
of the second list item.
2754
- Here is the first paragraph of
2755
the first item of a nested list.
2757
So this paragraph would be outside of the nested list,
2758
but inside the second list item of the outer list.
2760
But this paragraph is not part of the list at all.
2762
And the ambiguity remains::
2764
- Look at the hyphen at the beginning of the next line
2765
- is it a second list item marker, or a dash in the text?
2767
Similarly, we may want to refer to numbers inside enumerated
2770
1. How many socks in a pair? There are
2771
2. How many pants in a pair? Exactly
2774
Literal blocks and block quotes would still require consistent
2775
indentation for all their lines. For block quotes, we might be able
2776
to get away with only requiring that the first line of each contained
2777
element be indented. For example::
2781
This is a paragraph inside a block quote.
2782
Second and subsequent lines need not be indented at all.
2784
- A bullet list inside
2787
Second paragraph of the
2788
bullet list inside the block quote.
2790
Although feasible, this form of lazy indentation has problems. The
2791
document structure and hierarchy is not obvious from the indentation,
2792
making the source plaintext difficult to read. This will also make
2793
keeping track of the indentation while writing difficult and
2794
error-prone. However, these problems may be acceptable for Wikis and
2795
email mode, where we may be able to rely on less complex structure
2796
(few nested lists, for example).
2799
Multiple Roles in Interpreted Text
2800
==================================
2802
In reStructuredText, inline markup cannot be nested (yet; `see
2803
above`__). This also applies to interpreted text. In order to
2804
simultaneously combine multiple roles for a single piece of text, a
2805
syntax extension would be necessary. Ideas:
2809
`interpreted text`:role1,role2:
2811
2. Suggested by Jason Diamond::
2813
`interpreted text`:role1:role2:
2815
If a document is so complex as to require nested inline markup,
2816
perhaps another markup system should be considered. By design,
2817
reStructuredText does not have the flexibility of XML.
2819
__ `Nested Inline Markup`_
2822
Parameterized Interpreted Text
2823
==============================
2825
In some cases it may be expedient to pass parameters to interpreted
2826
text, analogous to function calls. Ideas:
2828
1. Parameterize the interpreted text role itself (suggested by Jason
2831
`interpreted text`:role1(foo=bar):
2833
Positional parameters could also be supported::
2835
`CSS`:acronym(Cascading Style Sheets): is used for HTML, and
2836
`CSS`:acronym(Content Scrambling System): is used for DVDs.
2838
Technical problem: current interpreted text syntax does not
2839
recognize roles containing whitespace. Design problem: this smells
2840
like programming language syntax, but reStructuredText is not a
2841
programming language.
2843
2. Put the parameters inside the interpreted text::
2845
`CSS (Cascading Style Sheets)`:acronym: is used for HTML, and
2846
`CSS (Content Scrambling System)`:acronym: is used for DVDs.
2848
Although this could be defined on an individual basis (per role),
2849
we ought to have a standard. Hyperlinks with embedded URIs already
2850
use angle brackets; perhaps they could be used here too::
2852
`CSS <Cascading Style Sheets>`:acronym: is used for HTML, and
2853
`CSS <Content Scrambling System>`:acronym: is used for DVDs.
2855
Do angle brackets connote URLs too much for this to be acceptable?
2856
How about the "tag" connotation -- does it save them or doom them?
2858
3. `Nested inline markup`_ could prove useful here::
2860
`CSS :def:`Cascading Style Sheets``:acronym: is used for HTML,
2861
and `CSS :def:`Content Scrambling System``:acronym: is used for
2864
Inline markup roles could even define the default roles of nested
2865
inline markup, allowing this cleaner syntax::
2867
`CSS `Cascading Style Sheets``:acronym: is used for HTML, and
2868
`CSS `Content Scrambling System``:acronym: is used for DVDs.
2870
Does this push inline markup too far? Readability becomes a serious
2871
issue. Substitutions may provide a better alternative (at the expense
2872
of verbosity and duplication) by pulling the details out of the text
2875
|CSS| is used for HTML, and |CSS-DVD| is used for DVDs.
2877
.. |CSS| acronym:: Cascading Style Sheets
2878
.. |CSS-DVD| acronym:: Content Scrambling System
2881
----------------------------------------------------------------------
2883
This whole idea may be going beyond the scope of reStructuredText.
2884
Documents requiring this functionality may be better off using XML or
2885
another markup system.
2887
This argument comes up regularly when pushing the envelope of
2888
reStructuredText syntax. I think it's a useful argument in that it
2889
provides a check on creeping featurism. In many cases, the resulting
2890
verbosity produces such unreadable plaintext that there's a natural
2891
desire *not* to use it unless absolutely necessary. It's a matter of
2892
finding the right balance.
2895
Syntax for Interpreted Text Role Bindings
2896
=========================================
2898
The following syntax (idea from Jeffrey C. Jacobs) could be used to
2899
associate directives with roles::
2901
.. :rewrite: class:: rewrite
2903
`She wore ribbons in her hair and it lay with streaks of
2906
The syntax is similar to that of substitution declarations, and the
2907
directive/role association may resolve implementation issues. The
2908
semantics, ramifications, and implementation details would need to be
2911
The example above would implement the "rewrite" role as adding a
2912
``class="rewrite"`` attribute to the interpreted text ("inline"
2913
element). The stylesheet would then pick up on the "class" attribute
2914
to do the actual formatting.
2916
The advantage of the new syntax would be flexibility. Uses other than
2917
"class" may present themselves. The disadvantage is complexity:
2918
having to implement new syntax for a relatively specialized operation,
2919
and having new semantics in existing directives ("class::" would do
2920
something different).
2922
The `"role" directive`__ has been implemented.
2924
__ ../../ref/rst/directives.html#role
2927
Character Processing
2928
====================
2930
Several people have suggested adding some form of character processing
2931
to reStructuredText:
2933
* Some sort of automated replacement of ASCII sequences:
2935
- ``--`` to em-dash (or ``--`` to en-dash, and ``---`` to em-dash).
2936
- Convert quotes to curly quote entities. (Essentially impossible
2937
for HTML? Unnecessary for TeX.)
2938
- Various forms of ``:-)`` to smiley icons.
2939
- ``"\ "`` to . Problem with line-wrapping though: it could
2940
end up escaping the newline.
2941
- Escaped newlines to <BR>.
2942
- Escaped period or quote or dash as a disappearing catalyst to
2943
allow character-level inline markup?
2945
* XML-style character entities, such as "©" for the copyright
2948
Docutils has no need of a character entity subsystem. Supporting
2949
Unicode and text encodings, character entities should be directly
2950
represented in the text: a copyright symbol should be represented by
2951
the copyright symbol character. If this is not possible in an
2952
authoring environment, a pre-processing stage can be added, or a table
2953
of substitution definitions can be devised.
2955
A "unicode" directive has been implemented to allow direct
2956
specification of esoteric characters. In combination with the
2957
substitution construct, "include" files defining common sets of
2958
character entities can be defined and used. `A set of character
2959
entity set definition files have been defined`__ (`tarball`__).
2960
There's also `a description and instructions for use`__.
2962
__ http://docutils.sf.net/tmp/charents/
2963
__ http://docutils.sf.net/tmp/charents.tgz
2964
__ http://docutils.sf.net/tmp/charents/README.html
2966
To allow for `character-level inline markup`_, a limited form of
2967
character processing has been added to the spec and parser: escaped
2968
whitespace characters are removed from the processed document. Any
2969
further character processing will be of this functional type, rather
2970
than of the character-encoding type.
2972
.. _character-level inline markup:
2973
../../ref/rst/restructuredtext.html#character-level-inline-markup
2977
.. text-replace:: "pattern" "replacement"
2979
- Support Unicode "U+XXXX" codes.
2980
- Support regexps, perhaps with alternative "regexp-replace"
2982
- Flags for regexps; ":flags:" option, or individuals.
2983
- Specifically, should the default be case-sensistive or
2990
* Should ^L (or something else in reST) be defined to mean
2991
force/suggest page breaks in whatever output we have?
2993
A "break" or "page-break" directive would be easy to add. A new
2994
doctree element would be required though (perhaps "break"). The
2995
final behavior would be up to the Writer. The directive argument
2996
could be one of page/column/recto/verso for added flexibility.
2998
Currently ^L (Python's ``\f``) characters are treated as whitespace.
2999
They're converted to single spaces, actually, as are vertical tabs
3000
(^K, Python's ``\v``). It would be possible to recognize form feeds
3001
as markup, but it requires some thought and discussion first. Are
3002
there any downsides? Many editing environments do not allow the
3003
insertion of control characters. Will it cause any harm? It would
3004
be useful as a shorthand for the directive.
3006
It's common practice to use ^L before Emacs "Local Variables"
3013
indent-tabs-mode: nil
3014
sentence-end-double-space: t
3018
These are already present in many PEPs and Docutils project
3019
documents. From the Emacs manual (info):
3021
A "local variables list" goes near the end of the file, in the
3022
last page. (It is often best to put it on a page by itself.)
3024
It would be unfortunate if this construct caused a final blank page
3025
to be generated (for those Writers that recognize the page breaks).
3026
We'll have to add a transform that looks for a "break" plus zero or
3027
more comments at the end of a document, and removes them.
3029
Probably a bad idea because there is no such thing as a page in a
3030
generic document format.
3032
* Could the "break" concept above be extended to inline forms?
3033
E.g. "^L" in the middle of a sentence could cause a line break.
3034
Only recognize it at the end of a line (i.e., ``\f\n``)?
3036
Or is formfeed inappropriate? Perhaps vertical tab (``\v``), but
3037
even that's a stretch. Can't use carriage returns, since they're
3038
commonly used for line endings.
3040
Probably a bad idea as well because we do not want to use control
3041
characters for well-readable and well-writable markup, and after all
3042
we have the line block syntax for line breaks.
3048
Add ``^superscript^`` inline markup? The only common non-markup uses
3049
of "^" I can think of are as short hand for "superscript" itself and
3050
for describing control characters ("^C to cancel"). The former
3051
supports the proposed syntax, and it could be argued that the latter
3052
ought to be literal text anyhow (e.g. "``^C`` to cancel").
3054
However, superscripts are seldom needed, and new syntax would break
3055
existing documents. When it's needed, the ``:superscript:``
3056
(``:sup:``) role can we used as well.
3062
Add the following directives?
3064
- "exec": Execute Python code & insert the results. Call it
3065
"python" to allow for other languages?
3067
- "system": Execute an ``os.system()`` call, and insert the results
3068
(possibly as a literal block). Definitely dangerous! How to make
3069
it safe? Perhaps such processing should be left outside of the
3070
document, in the user's production system (a makefile or a script or
3071
whatever). Or, the directive could be disabled by default and only
3072
enabled with an explicit command-line option or config file setting.
3073
Even then, an interactive prompt may be useful, such as:
3075
The file.txt document you are processing contains a "system"
3076
directive requesting that the ``sudo rm -rf /`` command be
3077
executed. Allow it to execute? (y/N)
3079
- "eval": Evaluate an expression & insert the text. At parse
3080
time or at substitution time? Dangerous? Perhaps limit to canned
3081
macros; see text.date_.
3083
.. _text.date: ../todo.html#text-date
3085
It's too dangerous (or too complicated in the case of "eval"). We do
3086
not want to have such things in the core.
3089
``encoding`` Directive
3090
======================
3092
Add an "encoding" directive to specify the character encoding of the
3093
input data? Not a good idea for the following reasons:
3095
- When it sees the directive, the parser will already have read the
3096
input data, and encoding determination will already have been done.
3098
- If a file with an "encoding" directive is edited and saved with
3099
a different encoding, the directive may cause data corruption.
3102
Support for Annotations
3103
=======================
3105
Add an "annotation" role, as the equivalent of the HTML "title"
3106
attribute? This is secondary information that may "pop up" when the
3107
pointer hovers over the main text. A corresponding directive would be
3108
required to associate annotations with the original text (by name, or
3109
positionally as in anonymous targets?).
3111
There have not been many requests for such feature, though. Also,
3112
cluttering WYSIWYG plaintext with annotations may not seem like a good
3113
idea, and there is no "tool tip" in formats other than HTML.
3119
Add a "term" role for unfamiliar or specialized terminology? Probably
3120
not; there is no real use case, and emphasis is enough for most cases.
3126
indent-tabs-mode: nil
3127
sentence-end-double-space: t