~ubuntu-branches/ubuntu/trusty/tagsoup/trusty-proposed

Viewing changes to .pc/0001-manpages.patch/tagsoup.1

Committer: Package Import Robot
Author(s): Emmanuel Bourg
Date: 2013-05-29 23:56:56 UTC
mfrom: (1.1.4)
Revision ID: package-import@ubuntu.com-20130529235656-du2tr6r4047oxtde

Tags: 1.2.1+-1

http://bugs.debian.org/639723

* Adopting package (Closes: #639723)
* The Maven artifacts are now deployed to /usr/share/maven-repo
* Improved the manpages (broken comment, fixed the command syntax)
* debian/control:
  - Updated Standards-Version to 3.9.4
  - Updated the Vcs-* fields (tagsoup is back to trunk/)
* debian/rules:
  - Added a clean target
  - Added a get-orig-pom target to fetch the pom from
    the central Maven repository

files added:
.pc/.quilt_patches

.pc/.quilt_series

.pc/0001-manpages.patch

.pc/0001-manpages.patch/tagsoup.1

debian/libtagsoup-java.poms

debian/patches/0001-manpages.patch

debian/pom.xml

files modified:
.pc/applied-patches

debian/changelog

debian/control

debian/patches/series

debian/rules

tagsoup.1

Show diffs side-by-side

added added

removed removed

.pc/0001-manpages.patch/tagsoup.1

\' TagSoup is licensed under the Apache License,

\' Version 2.0. You may obtain a copy of this license at

\' http://www.apache.org/licenses/LICENSE-2.0 . You may also have

\' additional legal rights not granted by this license.

\' TagSoup is distributed in the hope that it will be useful, but

\' unless required by applicable law or agreed to in writing, TagSoup

\' is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS

\' OF ANY KIND, either express or implied; not even the implied warranty

\' of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

.TH TAGSOUP "1" "January 2008" "TagSoup 1.2" "User Commands"

.SH NAME

tagsoup \- convert nasty, ugly HTML to clean XHTML

.SH SYNOPSIS

.B java -jar tagsoup-1.2

[

.I options

] [

.I files

]

.SH DESCRIPTION

.\" Add any additional description here

.PP

Rectify arbitrary HTML into clean XHTML,

using a tailored description of HTML.

The output will be well-formed XML, but not necessarily

.I valid

XHTML.

.PP

.TP

.B --files

multiple input

.I files

should be processed into corresponding output files

.TP

.BI --encoding= encoding

specifies the encoding of input files

.TP

.BI --output-encoding= encoding

specifies the encoding of the output

(if the encoding name begins with ``utf'',

the output will not contain character entities;

otherwise, all non-ASCII characters are

represented as entities)

.TP

.B --html

output rectified HTML rather than XML,

omitting the XML declaration

and any namespace declarations

.TP

.B --method=html

output rectified HTML rather than XML

(end-tags are omitted for empty elements,

and no character escaping is done in

script and style elements)

.TP

.B --omit-xml-declaration

omit the XML declaration

.TP

.B --lexical

output lexical features (specifically comments and any DOCTYPE declaration)

.TP

.B --nons

suppress namespaces in output

.TP

.B --nobogons

suppress unknown non-HTML elements in output

.TP

.B --nodefaults

suppress default attribute values

.TP

.B --nocolons

change explicit colons

in element and attribute names

to underscores

.TP

.B --norestart

don't restart any restartable elements

.TP

.B --ignorable

pass through ignorable whitespace

(whitespace in element-only content)

via SAX method handler ignorableWhitespace

.TP

.B --any

treat unknown non-HTML elements as allowing any content (default)

.TP

.B --emptybogons

treat unknown non-HTML elements as empty elements

.TP

.B --norootbogons

don't allow unknown non-HTML elements to be root elements

.TP

.BI --doctype-system= system-id

force DOCTYPE declaration to be output with specified system identifier

.TP

100

.BI --doctype-public= public-id

101

force DOCTYPE declaration to be output with specified public identifier

102

.TP

103

.B --standalone=[yes|no]

104

specify standalone pseudo-attribute in output XML declaration

105

.TP

106

.BI --version= version

107

specify version pseudo-attribute in output XML declaration

108

(does not affect actual version of XML output)

109

.TP

110

.B --nocdata

111

treat the CDATA-content elements

112

.I script

113

and

114

.I style

115

as ordinary elements

116

(mostly for testing)

117

.TP

118

.B --pyx

119

output PYX format rather than XML

120

(mostly for testing)

121

.TP

122

.B --pyxin

123

input is PYX-format HTML

124

(mostly for testing)

125

.TP

126

.B --reuse

127

reuse the same Parser object internally

128

(for testing only)

129

.TP

130

.B --help

131

output basic help

132

.TP

133

.B --version

134

output version number

135

.PP

136

.B TagSoup

137

is a parser and reformatter for nasty, ugly HTML.

138

Its normal processing mode is to accept HTML files on the command line,

139

or from the standard input if none are given, and output them

140

as clean XML

141

to the standard output. The encoding is assumed to be the platform-local

142

encoding on input, and is always UTF-8 on output.

143

.PP

144

When the

145

.B --files

146

option is given, each input file is processed into an output file of the

147

corresponding name, with the extension changed to

148

.IR xhtml .

149

If the extension is already

150

.IR xhtml ,

151

it is changed to

152

.IR xhtml_ .

153

.PP

154

TagSoup will repair, by whatever means necessary,

155

violations of XML well-formedness. In particular, it will fix up

156

malformed attribute names and supply missing attribute-value quotation marks.

157

More significantly, it supplies end-tags where HTML allows them

158

to be omitted, and sometimes where it doesn't. It will even supply

159

start-tags where necessary; for example, if a document begins with a

160

<li> tag, TagSoup will automatically prefix it with <html><body><ul>.

161

.PP

162

.SH BUGS

163

TagSoup can be fooled by missing close quotes after attribute values, and by

164

incorrect character encodings (it does not contain an encoding guesser).

165

.PP

166

TagSoup doesn't understand namespace declarations, which are not properly

167

part of HTML. Instead, any element or attribute name beginning

168

.IR foo :

169

will be put into the artificial namespace

170

.RI urn:x-prefix: foo .

171

.PP

172

For the same reasons, namespace-qualified attributes like

173

xml:space

174

can't be returned as default values,

175

though an explicit attribute in the xml namespace

176

will be returned with the proper namespace URI.

177

.SH AUTHOR

178

John Cowan <cowan@ccil.org>

179

.SH COPYRIGHT

180

181

.br

182

TagSoup is free software; see the source for copying conditions. There is NO

183

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Older »