doclifter.xml 44.4 KB
Newer Older
1
<!DOCTYPE refentry PUBLIC 
Eric S. Raymond's avatar
Eric S. Raymond committed
2
   "-//OASIS//DTD DocBook XML V4.1.2//EN"
Eric S. Raymond's avatar
Eric S. Raymond committed
3
   "docbook/docbookx.dtd">
Eric S. Raymond's avatar
Eric S. Raymond committed
4
5
6
7
<refentry id='doclifter.1'>
<refmeta>
<refentrytitle>doclifter</refentrytitle>
<manvolnum>1</manvolnum>
8
9
10
11
<refmiscinfo class='date'>Aug 16 2001</refmiscinfo>
<refmiscinfo class='source'>doclifter</refmiscinfo>
<refmiscinfo class='productname'>doclifter</refmiscinfo>
<refmiscinfo class='manual'>Documentation Tools</refmiscinfo>
Eric S. Raymond's avatar
Eric S. Raymond committed
12
13
</refmeta>
<refnamediv id='name'>
Eric S. Raymond's avatar
Eric S. Raymond committed
14
<refname>doclifter</refname>
Eric S. Raymond's avatar
Eric S. Raymond committed
15
<refpurpose>translate troff requests into DocBook</refpurpose>
Eric S. Raymond's avatar
Eric S. Raymond committed
16
17
18
19
</refnamediv>
<refsynopsisdiv id='synopsis'>

<cmdsynopsis>
Eric S. Raymond's avatar
Eric S. Raymond committed
20
  <command>doclifter</command>  
lolitapons's avatar
lolitapons committed
21
  <arg choice='opt'>-o <replaceable>output-location</replaceable></arg>
Eric S. Raymond's avatar
Eric S. Raymond committed
22
23
  <arg choice='opt'>-e <replaceable>output-encoding</replaceable></arg>
  <arg choice='opt'>-i <replaceable>input-encodings</replaceable></arg>
24
  <arg choice='opt'>-h <replaceable>hintfile</replaceable></arg>
Eric S. Raymond's avatar
Eric S. Raymond committed
25
  <arg choice='opt'>-q</arg>
26
  <arg choice='opt'>-x</arg>
Eric S. Raymond's avatar
Eric S. Raymond committed
27
  <arg choice='opt'>-v</arg>
28
  <arg choice='opt'>-w</arg>
29
  <arg choice='opt'>-V</arg>
30
  <arg choice='opt'>-D <replaceable>token=type</replaceable></arg>
31
  <arg choice='opt'>-I <replaceable>path</replaceable></arg>
32
  <arg choice='opt'>-S <replaceable>spoofname</replaceable></arg>
Eric S. Raymond's avatar
Eric S. Raymond committed
33
34
35
36
37
38
39
  <arg choice='plain' rep='repeat'><replaceable>file</replaceable></arg>
</cmdsynopsis>

</refsynopsisdiv>

<refsect1><title>Description</title>
<para><command>doclifter</command>
Eric S. Raymond's avatar
Eric S. Raymond committed
40
translates documents written in troff macros to DocBook.  Structural
Eric S. Raymond's avatar
Eric S. Raymond committed
41
subsets of the requests in
Eric S. Raymond's avatar
Eric S. Raymond committed
42
43
44
45
<citerefentry><refentrytitle>man</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
<citerefentry><refentrytitle>mdoc</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
<citerefentry><refentrytitle>ms</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
<citerefentry><refentrytitle>me</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
46
47
<citerefentry><refentrytitle>mm</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
and
Eric S. Raymond's avatar
Eric S. Raymond committed
48
49
50
<citerefentry><refentrytitle>troff</refentrytitle><manvolnum>1</manvolnum></citerefentry>
are supported.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
51
52
53
54
<para>The translation brings over all the structure of the original
document at section, subsection, and paragraph level.  Command and C
function synopses are translated into DocBook markup, not just a
verbatim display.  Tables (TBL markup) are translated into DocBook
55
table markup. PIC diagrams are translated into SVG.  Troff-level
56
information that might have structural implications is preserved in
Eric S. Raymond's avatar
Eric S. Raymond committed
57
XML comments.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
58
59
60
61
62
63
64
65
66

<para>Where possible, font-change macros are translated into
structural markup.  <command>doclifter</command> recognizes
stereotyped patterns of markup and content (such as the use of italics
in a FILES section to mark filenames) and lifts them.  A means to
edit, add, and save semantic hints about highlighting is supported.</para>

<para>Some cliches are recognized and lifted to structural markup 
even without highlighting.  Patterns recognized include
67
68
such things as URLs, email addresses, man page references, and
C program listings.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
69

70
71
72
73
74
<para>The tag <markup>.in</markup> and <markup>.ti</markup> requests are
passed through with complaints. They indicate presentation-level
markup that <command>doclifter</command> cannot translate into 
structure; the output will require hand-fixing.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
75
76
77
78
<para>The tag <markup>.ta</markup> is passed through with a complaint
unless the immediarely following by text lines contains a tab, in
which case the following span of lines containing tabs is lifted to a
table.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
79

Eric S. Raymond's avatar
Eric S. Raymond committed
80
81
82
83
84
85
86
87
<para>Under some circumstances, <command>doclifter</command> can even
lift formatted manual pages and the text output produced by
<citerefentry><refentrytitle>lynx</refentrytitle><manvolnum>1</manvolnum></citerefentry>
from HTML.  If it finds no macros in the input, but does find a NAME
section header, it tries to interpret the plain text as a manual page
(skipping boilerplate headers and footers generated by
<citerefentry><refentrytitle>lynx</refentrytitle><manvolnum>1</manvolnum></citerefentry>).
Translations produced in this way will be prone to miss structural
88
89
features, but this fallback is good enough for simple man
pages.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
90

91
<para><command>doclifter</command> does not do a perfect job, merely
Eric S. Raymond's avatar
Eric S. Raymond committed
92
a surprisingly good one.  Final polish should be applied by a human
93
being capable of recognizing patterns too subtle for a computer.  But
Eric S. Raymond's avatar
Eric S. Raymond committed
94
95
96
<command>doclifter</command> will almost always produce translations
that are good enough to be usable before hand-hacking.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
97
<para>See the <link linkend="troubleshooting">Troubleshooting</link>
Eric S. Raymond's avatar
Eric S. Raymond committed
98
99
section for discussion of how to solve document conversion
problems.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
100

Eric S. Raymond's avatar
Eric S. Raymond committed
101
102
103
</refsect1>

<refsect1><title>Options</title>
104
105
106
<para>If called without arguments <command>doclifter</command> acts as
a filter, translating troff source input on standard input to DocBook
markup on standard output.  If called with arguments, each argument
107
108
file is translated separately (but hints are retained, see below); the
suffix <filename>.xml</filename> is given to the translated output.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
109
110
<variablelist>
<varlistentry>
lolitapons's avatar
lolitapons committed
111
112
113
114
115
116
117
<term>-o</term>
<listitem>
<para>Set the output location where files will be saved. Defaults to current
working directory.</para>
</listitem>
</varlistentry>
<varlistentry>
118
119
120
121
122
123
124
<term>-h</term>
<listitem>
<para>Name a file to which information on semantic hints gathered
during analysis should be written.</para>
</listitem>
</varlistentry>
<varlistentry>
125
126
127
128
129
130
131
<term>-D</term>
<listitem>
<para>The <option>-D</option> allows you to post a hint. This may be
useful, for example, if <command>doclifter</command> is mis-parsing 
a synopsis because it doesn't recognize a token as a command.  This
hint is merged after hints in the input source have been read.</para>
</listitem>
Eric S. Raymond's avatar
Eric S. Raymond committed
132
</varlistentry>
133
<varlistentry>
134
135
136
137
138
139
140
141
<term>-I</term>
<listitem>
<para>The <option>-I</option> option adds its argument to the include
path used when docfilter searches for inclusions.  The include path is
initially just the current directory.</para>
</listitem>
</varlistentry>
<varlistentry>
142
143
144
145
146
147
148
<term>-S</term>
<listitem>
<para>Set the filename to be used in error and warning messages. This
is mainly inttended for use by test scripts.</para>
</listitem>
</varlistentry>
<varlistentry>
Eric S. Raymond's avatar
Eric S. Raymond committed
149
150
<term>-e</term>
<listitem>
Eric S. Raymond's avatar
Eric S. Raymond committed
151
152
153
154
155
156
157
158
159
160
161
<para>The <option>-e</option> allows you to set the output encoding of
the XML and the encoding field to be emitted in its header.  It
defaults to UTF-8.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>-i</term>
<listitem>
<para>The <option>-i</option> allows you to set a comma-separated list of
encodings to be looked for in the input. The default is
"ISO-8859-1,UTF-8", which should cover almost all cases.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
162
163
164
</listitem>
</varlistentry>
<varlistentry>
Eric S. Raymond's avatar
Eric S. Raymond committed
165
166
<term>-q</term>
<listitem>
167
168
<para>Normally, requests that <command>doclifter</command> could not
interpret (usually because they're presentation-level) are passed
Eric S. Raymond's avatar
Eric S. Raymond committed
169
through to XML comments in the output.  The -q option suppresses
170
171
this.  It also suppresses listing of macros.  Messages about requests
that are unrecognized or cannot be translated go to standard error
Eric S. Raymond's avatar
Eric S. Raymond committed
172
173
whatever the state of this option.  This option is intended to reduce 
clutter when you believe you have a clean lift of a document and want
Eric S. Raymond's avatar
Update.    
Eric S. Raymond committed
174
to lose the troff legacy.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
175
176
177
</listitem>
</varlistentry>
<varlistentry>
178
179
<term>-x</term>
<listitem>
Eric S. Raymond's avatar
Eric S. Raymond committed
180
<para>The -x option requests that <command>doclifter</command>
181
generate DocBook version 5 compatible xml content, rather than its
Eric S. Raymond's avatar
Eric S. Raymond committed
182
183
default DocBook version 4.4 output. Inclusions and entities
may not be handled correctly with this switch enabled.</para>
184
185
186
</listitem>
</varlistentry>
<varlistentry>
Eric S. Raymond's avatar
Eric S. Raymond committed
187
188
189
190
191
192
<term>-v</term>
<listitem>
<para>The -v option makes <command>doclifter</command>
noisier about what it's doing.  This is mainly useful for debugging.</para>
</listitem>
</varlistentry>
193
194
195
196
197
198
199
<varlistentry>
<term>-w</term>
<listitem>
<para>Enable strict portability checking.  Multiple instances of
-w increase the strictness.  See <xref linkend='portability'/>.</para>
</listitem>
</varlistentry>
200
201
202
203
204
205
<varlistentry>
<term>-V</term>
<listitem>
<para>With this option, the program emits a version message and exits.</para>
</listitem>
</varlistentry>
Eric S. Raymond's avatar
Eric S. Raymond committed
206
207
208
</variablelist>
</refsect1>

209
210
211
212
213
214
<refsect1><title>Translation Rules</title> 

<para>Overall, you can expect that font changes will be turned into
<sgmltag class='element'>Emphasis</sgmltag> macros with a <sgmltag
class='attribute'>Remap</sgmltag> attribute taken from the troff font
name.  The basic font names are R, I, B, U, CW, and SM.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
215
216
217
218

<para>Troff and macro-package special character escapes are mapped into ISO
character entities.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
219
<para>When <command>doclifter</command> encounters a
220
<markup>.so</markup> directive, it searches for the file.  If it can
Eric S. Raymond's avatar
Eric S. Raymond committed
221
222
get read access to the file, and open it, and the file consists
entirely of command lines and comments, then it is included.  If any
Eric S. Raymond's avatar
Eric S. Raymond committed
223
224
225
of these conditions fails, an entity reference for it is
generated.</para>

226
227
228
229
230
231
<para><command>doclifter</command> performs special parsing when it
recognizes a display such as is generated by
<markup>.DS/.DE</markup>. It repeatedly tries to parse first a
function synopsis, and then plain text off what remains in the
display.  Thus, most inline C function prototypes will be
lifted to structured markup.</para>
232

Eric S. Raymond's avatar
Eric S. Raymond committed
233
234
235
<para>Some notes on specific translations:</para>

<refsect2><title>Man Translation</title>
236

Eric S. Raymond's avatar
Eric S. Raymond committed
237
<para><command>doclifter</command> does a good job on most man pages,
Eric S. Raymond's avatar
Eric S. Raymond committed
238
It knows about the extended
Eric S. Raymond's avatar
Eric S. Raymond committed
239
240
241
242
243
244
<markup>UR</markup>/<markup>UE</markup>/<markup>UN</markup> and
<markup>URL</markup> requests supported under Linux.  If any
<markup>.UR</markup> request is present, it will translate these but
not wrap URLs outide them with <sgmltag>Ulink</sgmltag> tags. It also
knows about the extended <markup>.L</markup> (literal) font markup
from Bell Labs Version 8, and its friends.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
245

246
247
248
<para>The <markup>.TH</markup> macro is used to generate a <sgmltag
class='element'>RefMeta</sgmltag> section.  If present, the
date/source/manual arguments (see
Eric S. Raymond's avatar
Eric S. Raymond committed
249
<citerefentry><refentrytitle>man</refentrytitle><manvolnum>7</manvolnum></citerefentry>)
Eric S. Raymond's avatar
Eric S. Raymond committed
250
are wrapped in <sgmltag class='element'>RefMiscInfo</sgmltag> tag
Eric S. Raymond's avatar
Eric S. Raymond committed
251
252
pairs with those class attributes.  Note that
<command>doclifter</command> does not change the date.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
253

254
<para><command>doclifter</command> performs special parsing when it
Eric S. Raymond's avatar
Eric S. Raymond committed
255
256
257
recognizes a synopsis section. It repeatedly tries to parse first a
function synopsis, then a command synopsis, and then plain text off
what remains in the section.</para>
258

Eric S. Raymond's avatar
Eric S. Raymond committed
259
<para>The following man macros are translated into emphasis tags with
Eric S. Raymond's avatar
Eric S. Raymond committed
260
261
262
263
264
265
266
267
a remap attribute: <markup>.B</markup>, <markup>.I</markup>,
<markup>.L</markup>, <markup>.BI</markup>, <markup>.BR</markup>,
<markup>.BL</markup>, <markup>.IB</markup>, <markup>.IR</markup>,
<markup>.IL</markup>, <markup>.RB</markup>, <markup>.RI</markup>,
<markup>.RL</markup>, <markup>.LB</markup>, <markup>.LI</markup>,
<markup>.LR</markup>, <markup>.SB</markup>, <markup>.SM</markup>.
Some stereotyped patterns involving these macros are recognized and
turned into semantic markup.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
268
269

<para>The following macros are translated into paragraph breaks:
270
271
272
<markup>.LP</markup>, <markup>.PP</markup>, <markup>.P</markup>,
<markup>.HP</markup>, and the single-argument form
of <markup>.IP</markup>.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
273

274
<para>The two-argument form of <markup>.IP</markup>
Eric S. Raymond's avatar
Eric S. Raymond committed
275
276
277
278
279
is translated either as a <sgmltag
class='element'>VariableList</sgmltag> (usually) or <sgmltag
class='element'>ItemizedList</sgmltag> (if the tag is the troff bullet
or square character).</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
280
281
282
<para>The following macros are translated semantically:
<markup>.SH</markup>,<markup>.SS</markup>, <markup>.TP</markup>,
<markup>.UR</markup>, <markup>.UE</markup>, <markup>.UN</markup>,
283
284
285
<markup>.IX</markup>.  A <markup>.UN</markup> call just before
<markup>.SH</markup> or <markup>.SS</markup> sets the ID for the
new section.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
286

287
288
289
<para>The <markup>\*R</markup>, <markup>\*(Tm</markup>,
<markup>\*(lq</markup>, and <markup>\*(rq</markup> symbols are
translated.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
290

Eric S. Raymond's avatar
Eric S. Raymond committed
291
<para>The following (purely presentation-level) macros are ignored:
Eric S. Raymond's avatar
Eric S. Raymond committed
292
293
294
<markup>.PD</markup>,<markup>.DT</markup>.</para>

<para>The <markup>.RS</markup>/<markup>.RE</markup> macros are
295
296
297
298
translated differently depending on whether or not they precede
list markup.  When <markup>.RS</markup> occurs just before <markup>.TP</markup>
or <markup>.IP</markup> the result is nested lists. Otherwise, the
<markup>.RS</markup>/<markup>.RE</markup> pair is translated into a
Eric S. Raymond's avatar
Eric S. Raymond committed
299
<sgmltag>Blockquote</sgmltag> tag-pair.</para>
300

301
302
<para><markup>.DS</markup>/<markup>.DE</markup> is not part of the
documented man macro set, but is recognized because it shows up with
303
304
some frequency on legacy man pages from older Unixes.  <!-- It triggers
display processing. --></para>
305

Eric S. Raymond's avatar
Eric S. Raymond committed
306
<para>Certain extension macros originally defined under Ultrix are
Eric S. Raymond's avatar
Eric S. Raymond committed
307
translated structurally, including those that occasionally show up on
308
the manual pages of Linux and other open-source Unixes.
309
<markup>.EX</markup>/<markup>.EE</markup> (and the synonyms
310
<markup>.Ex</markup>/<markup>.Ee</markup>),
311
312
313
<markup>.Ds</markup>/<markup>.De</markup>, <!-- cause display
parsing. --> <markup>.NT</markup>/<markup>.NE</markup>,
<markup>.PN</markup>, and <markup>.MS</markup> are translated
Eric S. Raymond's avatar
Eric S. Raymond committed
314
structurally.</para>
315

316
<para>The following extension macros used by the X distribution are
317
318
319
320
also recognized and translated structurally: <markup>.FD</markup>,
<markup>.FN</markup>, <markup>.IN</markup>, <markup>.ZN</markup>,
<markup>.hN</markup>, and <markup>.C{</markup>/<markup>.C}</markup>
<!-- triggers display parsing.--> The <markup>.TA</markup> and
Eric S. Raymond's avatar
Eric S. Raymond committed
321
<markup>.IN</markup> requests are ignored.</para>
322

323
324
325
326
327
328
329
<para>When the man macros are active, any <markup>.Pp</markup> macro
definition containing the request <markup>.PP</markup> will be
ignored. and all instances of <markup>.Pp</markup> replaced with
<markup>.PP</markup>.  Similarly, <markup>.Tp</markup> will be
replaced with <markup>.TP</markup>.  This is the least painful way to
deal with some frequently-encountered stereotyped wrapper definitions
that would otherwise cause serious interpretation problems</para>
330

Eric S. Raymond's avatar
Eric S. Raymond committed
331
332
<para>Known problem areas with man translation:</para>
<itemizedlist>
333
<listitem><para>Weird uses of <markup>.TP</markup>.
Eric S. Raymond's avatar
Eric S. Raymond committed
334
These will sometime generate invalid XML and sometimes result in
335
336
a FIXME comment in the generated XML (a warning message will
also go to standard error).</para></listitem>
Eric S. Raymond's avatar
Eric S. Raymond committed
337

Eric S. Raymond's avatar
Eric S. Raymond committed
338
339
340
341
342
<listitem><para>It is debatable how the man macros
<markup>.HP</markup> and <markup>.IP</markup> without tag should be
translated.  We treat them as an ordinary paragraph break. We could
visually simulate a hanging paragraph with list markup, but this would
not be a structural translation.</para></listitem>
343
344
</itemizedlist>
</refsect2>
Eric S. Raymond's avatar
Eric S. Raymond committed
345

Eric S. Raymond's avatar
Eric S. Raymond committed
346
347
<refsect2><title>Pod2man Translation</title>
<para><command>doclifter</command>
348
recognizes the extension macros produced by
Eric S. Raymond's avatar
Eric S. Raymond committed
349
<command>pod2man</command>
350
(<markup>.Sh</markup>, <markup>.Sp</markup>, <markup>.Ip</markup>, <markup>.Vb</markup>, <markup>.Ve</markup>) and translates them
Eric S. Raymond's avatar
Eric S. Raymond committed
351
structurally.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
352

Eric S. Raymond's avatar
Eric S. Raymond committed
353
354
355
356
357
358
<para>The results of lifting pages produced by
<command>pod2man</command> should be checked carefully by eyeball,
especially the rendering of command and function
synopses. <command>Pod2man</command> generates rather perverse markup;
<command>doclifter</command>'s struggle to untangle it is sometimes in
vain.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
359

360
<para>If possible, generate your DocBook from the POD sources.  There
Eric S. Raymond's avatar
Eric S. Raymond committed
361
362
is a <application>pod2docbook</application> module on CPAN that does
this.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
363
364
</refsect2>

365
<refsect2><title>Tkman Translation</title>
366
<para><command>doclifter</command> recognizes the extension macros
367
368
369
370
371
372
used by the Tcl/Tk documentation system: <markup>.AP</markup>,
<markup>.AS</markup>, <markup>.BS</markup>, <markup>.BE</markup>,
<markup>.CS</markup>, <markup>.CE</markup>, <markup>.DS</markup>,
<markup>.DE</markup>, <markup>.SO</markup>, <markup>.SE</markup>,
<markup>.UL</markup>, <markup>.VS</markup>, <markup>.VE</markup>.  The
<markup>.AP</markup>, <markup>.CS</markup>, <markup>.CE</markup>,
Eric S. Raymond's avatar
Eric S. Raymond committed
373
374
<markup>.SO</markup>, <markup>.SE</markup>, <markup>.UL</markup>,
<markup>.QW</markup> and <markup>.PQ</markup>
375
macros are translated structurally.
376
<!-- <markup>.CS</markup>/<markup>.CE</markup> triggers display processing. -->
377
</para>
378
379
</refsect2>

Eric S. Raymond's avatar
Eric S. Raymond committed
380
381
382
383
384
385
386
<refsect2><title>Mandoc Translation</title>
<para><command>doclifter</command> should be able to do an excellent
job on most
<citerefentry><refentrytitle>mdoc</refentrytitle><manvolnum>7</manvolnum></citerefentry>
pages, because this macro package expresses a lot of semantic
structure.</para>

387
388
389
<para>Known problems with mandoc translation: All
<markup>.Bd</markup>/<markup>.Ed</markup> display blocks are
translated as <sgmltag class='element'>LiteralLayout</sgmltag> tag
390
pairs<!-- (and trigger display processing) -->.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
391
392
393
</refsect2>

<refsect2><title>Ms Translation</title>
Eric S. Raymond's avatar
Eric S. Raymond committed
394
395
396
<para><command>doclifter</command> does a good job on most ms pages.
One weak spot to watch out for is the generation of Author and
Affiliation tags.  The heuristics used to mine this information out of
397
the <markup>.AU</markup> section work for authors
Eric S. Raymond's avatar
Eric S. Raymond committed
398
399
400
who format their names in the way usual for English
(e.g. "M. E. Lesk", "Eric S. Raymond") but are quite brittle.</para>

401
402
403
404
<para>For a document to be recognized as containing ms markup, it must
have the extension <filename>.ms</filename>.  This avoids problems 
with false positives.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
405
406
407
408
409
410
411
<para>The <markup>.TL</markup>, <markup>.AU</markup>,
<markup>.AI</markup>, and <markup>.AE</markup> macros turn into
article metainformation in the expected way.  The
<markup>.PP</markup>, <markup>.LP</markup>, <markup>.SH</markup>, and
<markup>.NH</markup> macros turn into paragraph and section structure.
The tagged form of <markup>.IP</markup> is translated either as a
<sgmltag class='element'>VariableList</sgmltag> (usually) or <sgmltag
Eric S. Raymond's avatar
Eric S. Raymond committed
412
413
class='element'>ItemizedList</sgmltag> (if the tag is the troff bullet
or square character); the untagged version is treated as an ordinary
414
paragraph break.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
415

Eric S. Raymond's avatar
Eric S. Raymond committed
416
<para>The <markup>.DS</markup>/<markup>.DE</markup> pair is translated
417
418
to a <sgmltag class='element'>LiteralLayout</sgmltag> tag pair<!-- and
triggers display processing -->.  The
Eric S. Raymond's avatar
Eric S. Raymond committed
419
420
421
<markup>.FS</markup>/<markup>.FE</markup> pair is translated to a
<sgmltag class='element'>Footnote</sgmltag> tag pair.  The
<markup>.QP</markup>/<markup>.QS</markup>/<markup>.QE</markup>
Eric S. Raymond's avatar
Eric S. Raymond committed
422
423
requests define <sgmltag class='element'>BlockQuotes</sgmltag>.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
424
425
<para>The <markup>.UL</markup> font change is mapped to U.
<markup>.SM</markup> and <markup>.LG</markup> become numeric plus or
426
427
minus size steps suffixed to the <sgmltag
class='attribute'>Remap</sgmltag> attribute.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
428

429
<para>The <markup>.B1</markup> and <markup>.B2</markup> box macros are
Eric S. Raymond's avatar
Eric S. Raymond committed
430
translated to a <sgmltag class='element'>Sidebar</sgmltag> tag
431
pair.</para>
432

Eric S. Raymond's avatar
Eric S. Raymond committed
433
<para>All macros relating to page footers, multicolumn mode, and keeps
434
435
436
437
438
are ignored (<markup>.ND</markup>, <markup>.DA</markup>,
<markup>.1C</markup>, <markup>.2C</markup>, <markup>.MC</markup>,
<markup>.BX</markup>, <markup>.KS</markup>, <markup>.KE</markup>,
<markup>.KF</markup>).  The <markup>.R</markup>, <markup>.RS</markup>,
and <markup>.RE</markup> macros are ignored as well.
Eric S. Raymond's avatar
Eric S. Raymond committed
439
</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
440
441
442
443
</refsect2>

<refsect2><title>Me Translation</title>
<para>Translation of me documents tends to produce crude results that need
Eric S. Raymond's avatar
Eric S. Raymond committed
444
445
a lot of hand-hacking.  The format has little usable structure, and
documents written in it tend to use a lot of low-level troff macros;
446
both these properties tend to confuse <command>doclifter</command>.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
447

448
449
450
451
<para>For a document to be recognized as containing me markup, it must
have the extension <filename>.me</filename>.  This avoids problems 
with false positives.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
452
<para>The following macros are translated into paragraph breaks:
453
<markup>.lp</markup>, <markup>.pp</markup>.  The <markup>.ip</markup>
454
455
macro is translated into a <sgmltag
class='element'>VariableList</sgmltag>.  The <markup>.bp</markup>
456
457
458
macro is translated into an <sgmltag
class='element'>ItemizedList</sgmltag>.  The <markup>.np</markup>
macro is translated into an <sgmltag
Eric S. Raymond's avatar
Eric S. Raymond committed
459
460
461
class='element'>OrderedList</sgmltag>.</para>

<para>The b, i, and r fonts are mapped to emphasis tags with B, I, and
462
463
464
R <sgmltag class='attribute'>Remap</sgmltag> attributes.  The
<markup>.rb</markup> ("real bold") font is treated the same as
<markup>.b</markup>.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
465

466
467
<para><markup>.q(</markup>/<markup>.q)</markup> is translated
structurally <!-- triggers display processing -->.</para>
468

Eric S. Raymond's avatar
Eric S. Raymond committed
469
470
471
<para>Most other requests are ignored.</para>
</refsect2>

472
473
474
475
476
477
<refsect2><title>Mm Translation</title>
<para>Memorandum Macros documents translate well, as these macros
carry a lot of structural information.  The translation rules are
tuned for Memorandum or Released Paper styles; information associated
with external-letter style will be preserved in comments.</para>

478
479
480
481
<para>For a document to be recognized as containing mm markup, it must
have the extension <filename>.mm</filename>.  This avoids problems 
with false positives.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
482
<para>The following highlight macros are translated int Emphasis tags:
483
484
485
<markup>.B</markup>, <markup>.I</markup>, <markup>.R</markup>,
<markup>.BI</markup>, <markup>.BR</markup>, <markup>.IB</markup>,
<markup>.IR</markup>, <markup>.RB</markup>, <markup>.RI</markup>.
Eric S. Raymond's avatar
Eric S. Raymond committed
486
487
</para>

488
489
490
491
492
493
494
495
496
497
498
499
500
501
<para>The following macros are structurally translated:
<markup>.AE</markup>, <markup>.AF</markup>, <markup>.AL</markup>,
<markup>.RL</markup>, <markup>.APP</markup>, <markup>.APPSK</markup>,
<markup>.AS</markup>, <markup>.AT</markup>, <markup>.AU</markup>,
<markup>.B1</markup>, <markup>.B2</markup>, <markup>.BE</markup>,
<markup>.BL</markup>, <markup>.ML</markup>, <markup>.BS</markup>,
<markup>.BVL</markup>, <markup>.VL</markup>, <markup>.DE</markup>,
<markup>.DL</markup> <markup>.DS</markup>, <markup>.FE</markup>,
<markup>.FS</markup>, <markup>.H</markup>, <markup>.HU</markup>,
<markup>.IA</markup>, <markup>.IE</markup>, <markup>.IND</markup>,
<markup>.LB</markup>, <markup>.LC</markup>, <markup>.LE</markup>,
<markup>.LI</markup>, <markup>.P</markup>, <markup>.RF</markup>,
<markup>.SM</markup>, <markup>.TL</markup>, <markup>.VERBOFF</markup>,
<markup>.VERBON</markup>, <markup>.WA</markup>, <markup>.WE</markup>.
502
<!-- <markup>.DS</markup>/<markup>.DE</markup> triggers display processing. -->
Eric S. Raymond's avatar
Eric S. Raymond committed
503
504
</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
505
<para>The following macros are ignored:</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
506

507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
<para>&nbsp;<markup>.)E</markup>, <markup>.1C</markup>,
<markup>.2C</markup>, <markup>.AST</markup>, <markup>.AV</markup>,
<markup>.AVL</markup>, <markup>.COVER</markup>,
<markup>.COVEND</markup>, <markup>.EF</markup>, <markup>.EH</markup>,
<markup>.EDP</markup>, <markup>.EPIC</markup>, <markup>.FC</markup>,
<markup>.FD</markup>, <markup>.HC</markup>, <markup>.HM</markup>,
<markup>.GETR</markup>, <markup>.GETST</markup>, <markup>.HM</markup>,
<markup>.INITI</markup>, <markup>.INITR</markup>,
<markup>.INDP</markup>, <markup>.ISODATE</markup>,
<markup>.MT</markup>, <markup>.NS</markup>, <markup>.ND</markup>,
<markup>.OF</markup>, <markup>.OH</markup>, <markup>.OP</markup>,
<markup>.PGFORM</markup>, <markup>.PGNH</markup>,
<markup>.PE</markup>, <markup>.PF</markup>, <markup>.PH</markup>,
<markup>.RP</markup>, <markup>.S</markup>, <markup>.SA</markup>,
<markup>.SP</markup>, <markup>.SG</markup>, <markup>.SK</markup>,
<markup>.TAB</markup>, <markup>.TB</markup>, <markup>.TC</markup>,
<markup>.VM</markup>, <markup>.WC</markup>.</para>

<para>The following macros generate warnings: <markup>.EC</markup>,
526
<markup>.EX</markup>, <markup>.GETHN</markup>,
527
528
529
530
531
<markup>.GETPN</markup>, <markup>.GETR</markup>,
<markup>.GETST</markup>, <markup>.LT</markup>, <markup>.LD</markup>,
<markup>.LO</markup>, <markup>.MOVE</markup>, <markup>.MULB</markup>,
<markup>.MULN</markup>, <markup>.MULE</markup>,
<markup>.NCOL</markup>, <markup>.nP</markup>, <markup>.PIC</markup>,
532
533
<markup>.RD</markup>, <markup>.RS</markup>, <markup>.RE</markup>, 
<markup>.SETR</markup>
Eric S. Raymond's avatar
Eric S. Raymond committed
534
535
</para>

536
537
538
539
<para>Pairs of <markup>.DS</markup>/<markup>.DE</markup> are
interpreted as informal figures.  If an <markup>.FG</markup> is
present it becomes a caption element.</para>

540
541
542
<para>&nbsp;<markup>.BS</markup>/<markup>.BE</markup> and
<markup>.IA</markup>/<markup>.IE</markup> pairs are passed through.
The text inside them may need to be deleted or moved.</para>
543

544
<para>The mark argument of <markup>.ML</markup> is
Eric S. Raymond's avatar
Eric S. Raymond committed
545
ignored; the following list id formatted as a normal <sgmltag
Eric S. Raymond's avatar
Eric S. Raymond committed
546
class='element'>ItemizedList</sgmltag>.</para>
547

548
549
550
551
<para>The contents of <markup>.DS</markup>/<markup>.DE</markup> or
<markup>.DF</markup>/<markup>.DE</markup> gets turned into a <sgmltag
class='element'>Screen</sgmltag> display.  Arguments controlling
presentation-level formatting are ignored.</para>
552
553
554

</refsect2>

Eric S. Raymond's avatar
Eric S. Raymond committed
555
556
<refsect2><title>Mwww Translation</title> <para>The mwww macros are an
extension to the man macros supported by
Eric S. Raymond's avatar
Eric S. Raymond committed
557
558
559
<citerefentry><refentrytitle>groff</refentrytitle><manvolnum>1</manvolnum></citerefentry>
for producing web pages.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
560
561
562
563
<para>The <markup>URL</markup>, <markup>FTP</markup>,
<markup>MAILTO</markup>, <markup>FTP</markup>, <markup>IMAGE</markup>,
<markup>TAG</markup> tags are translated structurally.  The
<markup>HTMLINDEX</markup>, <markup>BODYCOLOR</markup>,
Eric S. Raymond's avatar
Eric S. Raymond committed
564
<markup>BACKGROUND</markup>, <markup>HTML</markup>, and
Eric S. Raymond's avatar
Eric S. Raymond committed
565
<markup>LINE</markup> tags are ignored.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
566
567
</refsect2>

568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
<refsect2><title>TBL Translation</title> 

<para>All structural features of TBL tables are translated, including
both horizontal and vertical spanning with &lsquo;s&rsquo; and
&lsquo;^&rsquo;.  The &lsquo;l&rsquo;, &lsquo;r&rsquo;, and
&lsquo;c&rsquo; formats are supported; the &lsquo;n&rsquo; column
format is rendered as &lsquo;r&rsquo;. Line continuations with
<sgmltag class='element'>T{</sgmltag> and <sgmltag
class='element'>T}</sgmltag> are handled correctly.  So is
<markup>.TH</markup>.</para>

<para>The <markup>expand</markup>, <markup>box</markup>,
<markup>doublebox</markup>, <markup>allbox</markup>,
<markup>center</markup>, <markup>left</markup>, and
<markup>right</markup> options are supported.  The GNU synonyms
<markup>frame</markup> and <markup>doubleframe</markup> are also
recognized.  But the distinction between single and double rules and
boxes is lost.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
586
587
588

<para>Table continuations (<sgmltag class='element'>.T&amp;</sgmltag>)
are not supported.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
589

590
<para>If the first nonempty line of text immediately before a table is
Eric S. Raymond's avatar
Eric S. Raymond committed
591
boldfaced, it is interpreted as a title for the table and the table
592
593
594
is generated using a <sgmltag class='element'>table</sgmltag> and
<sgmltag class='element'>title</sgmltag>.  Otherwise the table is
translated with <sgmltag
Eric S. Raymond's avatar
Eric S. Raymond committed
595
class='element'>informaltable</sgmltag>.</para>
596

Eric S. Raymond's avatar
Eric S. Raymond committed
597
<para>Most other presentation-level TBL commands are ignored.
598
The &lsquo;b&rsquo; format qualifier is processed, but point size and width 
Eric S. Raymond's avatar
Eric S. Raymond committed
599
qualifiers are not.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
600
601
602
</refsect2>

<refsect2><title>Pic Translation</title>
603
604
<para>PIC sections are translated to SVG.
<application>doclifter</application> calls out to
Eric S. Raymond's avatar
Got it.    
Eric S. Raymond committed
605
<citerefentry><refentrytitle>pic2plot</refentrytitle><manvolnum>1</manvolnum></citerefentry>
Eric S. Raymond's avatar
Eric S. Raymond committed
606
to accomplish this; you must have that utility installed for PIC
607
translation to work.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
608
609
</refsect2>

610
611
612
<refsect2><title>Eqn Translation</title> <para>EQN sections are
filtered into embedded MathML with 
<command>eqn -TMathML</command> if possible, otherwise passed
Eric S. Raymond's avatar
Eric S. Raymond committed
613
through enclosed in <sgmltag class='element'>LiteralLayout</sgmltag> tags.
Eric S. Raymond's avatar
Eric S. Raymond committed
614
After a delim statement has been seen, inline eqn delimiters are
615
616
617
translated into an XML processing instruction. Exception: inline
eqn equations consisting of a single character are translated to an
<sgmltag>Emphasis</sgmltag> with a Role attribute of eqn.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
618
619
620
621
622
623
</refsect2>

<refsect2><title>Troff Translation</title>
<para>The troff translation is meant only to support interpretation of the
macro sets. It is not useful standalone.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
624
625
626
<para>The <markup>.nf</markup> and <markup>.fi</markup> macros are
interpreted as literal-layout boundaries.  Calls to the
<markup>.so</markup> macro either cause inclusion or are translated
Eric S. Raymond's avatar
Eric S. Raymond committed
627
into XML entity inclusions (see above).  Calls to the
Eric S. Raymond's avatar
Eric S. Raymond committed
628
629
<markup>.ul</markup> and <markup>.cu</markup> macros cause following
lines to be wrapped in an <sgmltag class='element'>Emphasis</sgmltag>
630
631
632
633
tag with a <sgmltag class='attribute'>Remap</sgmltag> attribute of
"U".  Calls to <markup>.ft</markup> generate corresponding start or
end emphasis tags.  Calls to <markup>.tr</markup> cause character
translation on output. Calls to <markup>.bp</markup> generate a
Eric S. Raymond's avatar
Eric S. Raymond committed
634
635
<sgmltag class='element'>BeginPage</sgmltag> tag (in paragraphed text
only). Calls to <markup>.sp</markup> generate a paragraph break (in
636
637
638
639
640
paragraphed text only).  Calls to <markup>.ti</markup> wrap the
following line in a <sgmltag class='element'>BlockQuote</sgmltag>
These are the only troff requests we translate to DocBook.  The rest
of the troff emulation exists because macro packages use it internally
to expand macros into elements that might be structural.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
641
642
643
644
645

<para>Requests relating to macro definitions and strings
(<markup>.ds</markup>, <markup>.as</markup>, <markup>.de</markup>,
<markup>.am</markup>, <markup>.rm</markup>, <markup>.rn</markup>,
<markup>.em</markup>) are processed and expanded.  The
646
647
<markup>.ig</markup> macro is also processed.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
648
649
650
<para>Conditional macros (<markup>.if</markup>, <markup>.ie</markup>,
<markup>.el</markup>) are handled.  The built-in conditions o, n, t,
e, and c are evaluated as if for <application>nroff</application> on
651
652
page one of a document.  The m, d, and r troff conditionals
are also interpreted. String comparisons are evaluated by straight
Eric S. Raymond's avatar
Eric S. Raymond committed
653
654
655
textual comparison.  All numeric expressions evaluate to true. </para>

<para>The extended <application>groff</application> requests
656
<markup>cc</markup>, <markup>c2</markup>, 
657
658
659
660
661
<markup>ab</markup>, <markup>als</markup>, <markup>do</markup>,
<markup>nop</markup>, and <markup>return</markup> and
<markup>shift</markup> are interpreted. Its <markup>.PSPIC</markup>
extension is translated into a <sgmltag
class='element'>MediaObject</sgmltag>.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
662
663
664
665
666

<para>The <markup>.tm</markup> macro writes its arguments to standard
error (with <option>-t</option>).  The <markup>.pm</markup> macro
reports on defined macros and strings.  These facilities may aid in
debugging your translation.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
667

668
669
670
<para>Some troff escape sequences are lifted:</para>

<orderedlist>
Eric S. Raymond's avatar
Eric S. Raymond committed
671
672
<listitem><para>The \e and \\ escapes become a bare backslash, \. a
period, and \- a bare dash.</para></listitem>
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699

<listitem><para>The troff escapes \^, \`, \' \&amp;, \0, and \| are lifted 
to equivalent ISO special spacing characters.</para></listitem>

<listitem><para>A \ followed by space is translated to an ISO non-breaking 
space entity.</para></listitem>

<listitem><para>A \~ is also translated to an ISO non-breaking space
entity; properly this should be a space that can't be used for a
linebreak but stretches like ordinary whitepace during line
adjustment, but there is no ISO or Unicode entity for
that.</para></listitem>

<listitem><para>The \u and \d half-line motion vertical motion
escapes, when paired, become <markup>Superscript</markup> or
<markup>Subscript</markup> tags.</para></listitem>

<listitem><para>The \c escape is handled as a line continuation. in
circumstances where that matters (e.g. for
token-pasting).</para></listitem>

<listitem><para>The \f escape for font changes is translated in
various context-dependent ways. First, <command>doclifter</command>
looks for cliches involving font changes that have semantic meaning,
and lifts to a structural tag.  If it can't do that, it generates an
<sgmltag>Emphasis</sgmltag> tag.</para></listitem>

700
701
702
703
704
<listitem><para>The \m[] extension is translated into a
<sgmltag>phrase</sgmltag> span with a remap attribute carrying the
color.  Note: Stylesheets typically won't render
this!</para></listitem>

705
706
707
708
709
710
711
712
713
<listitem><para>Some uses of the \o request are translated: pairs with
a letter followed by one of the characters ` ' : ^ o ~ are translated
to combining forms with diacriticals acute, grave, umlaut, circumflex,
ring, and tilde respectively if the corresponding Latin-1 or Latin-2
character exists as an ISO literal.</para></listitem>

</orderedlist>

<para>Other escapes than these will yield warnings or errors.</para>
714

Eric S. Raymond's avatar
Eric S. Raymond committed
715
<para>All other troff requests are ignored but passed through into
Eric S. Raymond's avatar
Eric S. Raymond committed
716
XML comments.  A few (such as <markup>.ce</markup>) also trigger
717
a warning message.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
718
719
720
</refsect2>
</refsect1>

721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
<refsect1 id="portability"><title>Portability Checking</title>

<para>When portability checking is enabled,
<command>doclifter</command> emits portability warnings about markup
which it can handle but which will break various other viewers and
interpreters.</para>

<orderedlist>
<listitem><para>At level 1, it will warn about constructions that
would break 
<citerefentry><refentrytitle>man2html</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
(the C program distributed with Linux
<citerefentry><refentrytitle>man</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
not the older and much less capable Perl script).  A close derivative of this
code is used in GNOME <application>yelp</application>.  This should
be the minimum level of portability you aim for, and corresponds to what is 
recommended on the 
<citerefentry><refentrytitle>groff_man</refentrytitle><manvolnum>7</manvolnum></citerefentry>
manual page.
</para></listitem>
<listitem><para>At level 2, it will warn about constructions that will
break portability back to the Unix classic tools (including long macro
names and glyph references with \[]).</para></listitem>
</orderedlist>
</refsect1>

747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
<refsect1 id="hints"><title>Semantic analysis</title>
<para><command>doclifter</command> keeps two lists of semantic hints
that it picks up from analyzing source documents (especially from
parsing command and function synopses).  The local list includes:</para>

<itemizedlist>
<listitem><para>Names of function formal arguments</para></listitem>
<listitem><para>Names of command options</para></listitem>
</itemizedlist>

<para>Local hints are used to mark up the individual page from 
which they are gathered. The global list includes:</para>

<itemizedlist>
<listitem><para>Names of functions</para></listitem>
<listitem><para>Names of commands</para></listitem>
Eric S. Raymond's avatar
Eric S. Raymond committed
763
<listitem><para>Names of function return types</para></listitem>
764
765
766
767
</itemizedlist>

<para>If <command>doclifter</command> is applied to multiple files,
the global list is retained in memory.  You can dump a report of
Eric S. Raymond's avatar
Eric S. Raymond committed
768
769
global hints at the end of the run with the <option>-h</option>
option.  The format of the hints is as follows:</para>
770
771

<programlisting>
Eric S. Raymond's avatar
Eric S. Raymond committed
772
&nbsp;.\&quot; | mark &lt;phrase&gt; as &lt;markup&gt;
Eric S. Raymond's avatar
Eric S. Raymond committed
773
</programlisting>
774

775
<para>where <userinput>&lt;phrase&gt;</userinput> is an item of text
776
and <userinput>&lt;markup&gt;</userinput> is the DocBook markup text
777
778
it should be wrapped with whenever it appeared either highlighted
or as a word surrounded by whitespace in the source text.</para>
779

Eric S. Raymond's avatar
Eric S. Raymond committed
780
<para>Hints derived from earlier files are also applied to later ones.
781
This behavior may be useful when lifting collections of documents that
Eric S. Raymond's avatar
Eric S. Raymond committed
782
apply to a function or command library.  What should be more useful is
Eric S. Raymond's avatar
Eric S. Raymond committed
783
the fact that a hints file dumped with <option>-h</option> can be one of
Eric S. Raymond's avatar
Eric S. Raymond committed
784
the file arguments to <command>doclifter</command>; the code detects
Eric S. Raymond's avatar
Eric S. Raymond committed
785
this special case and does not write XML output for such a file.
Eric S. Raymond's avatar
Eric S. Raymond committed
786
787
788
789
790
791
792
793
Thus, a good procedure for lifting a large library is to generate a 
hints file with a first run, inspect it to delete false positives, and
use it as the first input to a second run.</para>  

<para>It is also possible to include a hints file directly in a troff
sourcefile.  This may be useful if you want to enrich the file by
stages before converting to XML.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
794
</refsect1>
795
796

<refsect1 id="troubleshooting"><title>Troubleshooting</title>
Eric S. Raymond's avatar
Eric S. Raymond committed
797

798
799
800
801
<para><command>doclifter</command> tries to warn about problems that
it can can diagnose but not fix by itself.  When it says
<computeroutput>"look for FIXME"</computeroutput>, do that in the
generated XML; the markup around that token may be wrong.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
802

803
<para>Occasionally (less than 2% of the time)
804
805
806
807
808
<command>doclifter</command> will produce invalid DocBook markup even
from correct troff markup.  Usually this results from strange
constructions in the source page, or macro calls that are beyond the
ability of <command>doclifter</command>'s macro processor to get
right.  Here are some things to watch for, and how to fix them:</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
809

Eric S. Raymond's avatar
Eric S. Raymond committed
810
811
812
<refsect2><title>Malformed command synopses.</title>

<para>If you get a
813
message that says <computeroutput>"command synopsis parse
814
815
816
817
failed"</computeroutput>, try rewriting the synopsis in your manual
page source.  The most common cause of failure is unbalanced []
groupings, a bug that can be very difficult to notice by eyeball.  To
assist with this, the error message includes a token number in
Eric S. Raymond's avatar
Eric S. Raymond committed
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
parentheses indicating on which token the parse failed.</para>  

<para>For more information, use the -v option.  This will trigger
a dump telling you what the command synopsis looked like after
preprocessing, and indicate on which token the parse failed (both with
a token number and a caret sign inserted in the dump of the synopsis
tokens).  Try rewriting the synopsis in your manual page source.  The
most common cause of failure is unbalanced [] groupings, a bug that
can be very difficult to notice by eyeball.  To assist with this, the
error token dump tries to insert &lsquo;$&rsquo; at the point of the
last nesting-depth increase, but the code that does this is
failure-prone.</para>

</refsect2>
<refsect2><title>Confusing macro calls.</title>
Eric S. Raymond's avatar
Eric S. Raymond committed
833

Eric S. Raymond's avatar
Eric S. Raymond committed
834
835
836
<para>Some manual page authors replace standard requests (like
<markup>.PP</markup>, <markup>.SH</markup> and <markup>.TP</markup>)
with versions that do different things in <command>nroff</command> and
Eric S. Raymond's avatar
Eric S. Raymond committed
837
838
839
<command>troff</command> environments.  While
<command>doclifter</command> tries to cope and usually does a good
job, the quirks of [nt]roff are legion and confusing macro calls
840
841
842
sometimes lead to bad XML being generated. A common symptom of such
problems is unclosed <sgmltag class='element'>Emphasis</sgmltag>
tags.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
843

Eric S. Raymond's avatar
Eric S. Raymond committed
844
845
846
847
848
849
850
851
852
853
854
855
</refsect2>
<refsect2><title>Malformed list syntax.</title>

<para> The manual-page parser can be confused by <markup>.TP</markup>
constructs that have header tags but no following body.  If the XML
produced doesn't validate, and the problem seems to be a misplaced
<sgmltag>listitem</sgmltag> tag, try using the verbose (-v) option.
This will enable line-numbered warnings that may help you zero in on
the problem.</para>

</refsect2>
<refsect2><title>Section nesting problems with SS.</title>
Eric S. Raymond's avatar
Eric S. Raymond committed
856

857
858
<para>The message <computeroutput>"possible section nesting
error"</computeroutput> means that the program has seen two adjacent
Eric S. Raymond's avatar
Eric S. Raymond committed
859
subsection headers.  In man pages, subsections don't have a depth
Eric S. Raymond's avatar
Eric S. Raymond committed
860
861
862
argument, so <command>doclifter</command> cannot be certain how 
subsections should be nested. Any subsection heading between the
indicated line and the beginning of the next top-level section might
Eric S. Raymond's avatar
Eric S. Raymond committed
863
be wrong and require correcting by hand.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
864

Eric S. Raymond's avatar
Eric S. Raymond committed
865
866
867
868
869
870
871
872
873
</refsect2>
<refsect2><title>Bad output with no doclifter error message</title>

<para>If you're translating a page that uses user-defined macros, and
doclifter fails to complain about it but you get bad output, the first
thing to do is simplify or eliminate the user-defined macros.  Replace
them with stock requests where possible.</para>

</refsect2>
Eric S. Raymond's avatar
Eric S. Raymond committed
874
875
</refsect1>

Eric S. Raymond's avatar
Eric S. Raymond committed
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
<refsect1 id='improving'><title>Improving Translation Quality</title> 

<para>There are a few constructions that are a good idea to check by hand
after lifting a page.</para>

<para>Look near the <sgmltag class='element'>BlockQuote</sgmltag> tags.
The troff temporary indent request (<markup>.ti</markup>) is translated
into a <sgmltag class='element'>BlockQuote</sgmltag> wrapper around the
following line.  Sometimes <sgmltag class='element'>LiteralLayout</sgmltag>
or <sgmltag class='element'>ProgramListing</sgmltag> would be a better
translation, but <command>doclifter</command> has no way to know this.
</para>

<para>It is not possible to unambiguously detect candidates for
wrapping in a DocBook <sgmltag>option</sgmltag> tag in running
text. If you care, you'll have to check for these and fix them by
hand.</para>

<!-- para>The troff-level <markup>.nf</markup>/<markup>.fi</markup> macros
895
896
897
898
don't trigger display parsing (doing so would wildly complicate macro
interpretation).  If you are translating a document that uses them to wrap
function synopses, you can improve the translation by hand-hacking
them to <markup>.DS</markup>/<markup>.DE</markup> or the equivalent in
Eric S. Raymond's avatar
Eric S. Raymond committed
899
whatever macro set is active.</para -->
900

Eric S. Raymond's avatar
Eric S. Raymond committed
901
902
</refsect1>

Eric S. Raymond's avatar
Eric S. Raymond committed
903
<refsect1><title>Bugs And Limitations</title>
904

905
<para>About 3% of man pages will either make this program throw error status
Eric S. Raymond's avatar
Eric S. Raymond committed
906
907
908
909
1 or generate invalid XML. In almost all such cases the misbehavior is
triggered by markup bugs in the source that are too severe to be 
coped with.</para>

910
<para>Equation number arguments of EQN calls are ignored.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
911

Eric S. Raymond's avatar
Eric S. Raymond committed
912
913
914
<para>Semicolon used as a TBL field separator will lead to garbled
tables. The easiest way to fix this is by patching the source.</para>

915
<para>The function-synopsis parser is crude (it's not a compiler) and
Eric S. Raymond's avatar
Eric S. Raymond committed
916
917
prone to errors.  Function-synopsis markup should be checked carefully
by a human.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
918

Eric S. Raymond's avatar
Eric S. Raymond committed
919
920
921
922
923
924
<para>If a man page has both paragraphed text in a Synopsis section
and also a body section before the Synopis section, bad things will
happen.</para>

<para>Running text (e.g., explanatory notes) at the end of a Synopsis
section cannot reliably be distinguished from synopsis-syntax
Eric S. Raymond's avatar
Eric S. Raymond committed
925
markup. (This problem is AI-complete.)</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
926

Eric S. Raymond's avatar
Eric S. Raymond committed
927
928
<para>Some firewalls put in to cope with common malformations in troff
code mean that the tail end of a span between two
929
<markup>\f{B,I,U,(CW}</markup> or <markup>.ft</markup> highlight
Eric S. Raymond's avatar
Eric S. Raymond committed
930
changes may not be completely covered by corresponding <sgmltag
Eric S. Raymond's avatar
Eric S. Raymond committed
931
class='element'>Emphasis</sgmltag> macros if (for example) the span
Eric S. Raymond's avatar
Eric S. Raymond committed
932
crosses a boundary between filled and unfilled
933
(<markup>.nf</markup>/<markup>.fi</markup>) text.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
934

Eric S. Raymond's avatar
Eric S. Raymond committed
935
<para>The treatment of conditionals relies on the assumption that
Eric S. Raymond's avatar
Eric S. Raymond committed
936
937
938
939
conditional macros never generate structural or font-highlight markup
that differs between the if and else branches.  This appears to be
true of all the standard macro packages, but if you roll any of your
own macros you're on your own.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
940
941
942
943

<para>Macro definitions in a manual page NAME section are not
interpreted.</para>

Eric S. Raymond's avatar
Eric S. Raymond committed
944
945
946
<para>Uses of \c for line continuation sometimes are not translated,
leaving the \c in the output XML. The program will print a warning 
when this occurs.</para>
947

948
<para>It is not possible to unambiguously detect candidates for
Eric S. Raymond's avatar
Eric S. Raymond committed
949
wrapping in a DocBook <sgmltag>option</sgmltag> tag in running
950
951
text. If you care, you'll have to check for these and fix them by
hand.</para>
952

953
954
955
<para>The line numbers in <command>doclifter</command> error messages
are unreliable in the presence of <markup>.EQ/.EN</markup>, 
<markup>.PS/.PE</markup>, and quantum fluctuations.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
956
957
</refsect1>

Eric S. Raymond's avatar
Eric S. Raymond committed
958
959
960
961
<refsect1><title>Old macro sets</title> 
<para>There is a conflict between Berkeley ms's documented
<markup>.P1</markup> print-header-on-page request and an undocumented
Bell Labs use for displayed program and equation listings.  The
Eric S. Raymond's avatar
Eric S. Raymond committed
962
963
964
<emphasis remap='B'>ms</emphasis> translator uses the Bell Labs
interpretation when <markup>.P2</markup> is present in the document,
and otherwise ignores the request.</para>
Eric S. Raymond's avatar
Eric S. Raymond committed
965
966
</refsect1>

Eric S. Raymond's avatar
Eric S. Raymond committed
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
<refsect1><title>Return Values</title>
<para>On successful completion, the program returns status 0.  
It returns 1 if some file or standard input could not be translated.  
It returns 2 if one of the input sources was a <markup>.so</markup> inclusion.
It returns 3 if there is an error in reading or writing files.  
It returns 4 to indicate an internal error.
It returns 5 when aborted by a keyboard interrupt. </para>

<para>Note that a zero return does not guarantee that the output is
valid DocBook.  It will almost always (as in, more than 98% of cases)
be syntactically valid XML, but in some rare cases fixups by hand may be
necessary to meet the semantics of the DocBook DTD.  Validation
problems are most likely to occur with complicated list markup.</para>
</refsect1>

Eric S. Raymond's avatar
Eric S. Raymond committed
982
<refsect1><title>Requirements</title>
Eric S. Raymond's avatar
Eric S. Raymond committed
983
984
<para>The
<citerefentry><refentrytitle>pic2plot</refentrytitle><manvolnum>1</manvolnum></citerefentry>
985
986
987
utility must be installed in order to translate PIC diagrams to SVG.</para>
</refsect1>

Eric S. Raymond's avatar
Eric S. Raymond committed
988
989
<refsect1><title>See Also</title>
<para><citerefentry><refentrytitle>man</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
Eric S. Raymond's avatar
Eric S. Raymond committed
990
<citerefentry><refentrytitle>mdoc</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
Eric S. Raymond's avatar
Eric S. Raymond committed
991
992
<citerefentry><refentrytitle>ms</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
<citerefentry><refentrytitle>me</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
Eric S. Raymond's avatar
Eric S. Raymond committed
993
<citerefentry><refentrytitle>mm</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
Eric S. Raymond's avatar
Eric S. Raymond committed
994
<citerefentry><refentrytitle>mwww</refentrytitle><manvolnum>7</manvolnum></citerefentry>,
Eric S. Raymond's avatar
Eric S. Raymond committed
995
996
997
998
<citerefentry><refentrytitle>troff</refentrytitle><manvolnum>1</manvolnum></citerefentry>.</para>
</refsect1>

<refsect1><title>Author</title>
Eric S. Raymond's avatar
Eric S. Raymond committed
999
<para>Eric S. Raymond <email>esr@thyrsus.com</email></para>
Eric S. Raymond's avatar
Eric S. Raymond committed
1000