~ubuntu-branches/ubuntu/trusty/plainbox-provider-checkbox/trusty

« back to all changes in this revision

Viewing changes to bin/xml_sanitize

  • Committer: Package Import Robot
  • Author(s): Zygmunt Krynicki
  • Date: 2014-04-07 19:00:31 UTC
  • mfrom: (3.1.1 sid)
  • Revision ID: package-import@ubuntu.com-20140407190031-rf836grml6oilfyt
Tags: 0.4-1
* New upstream release. List of bugfixes:
  https://launchpad.net/plainbox-provider-checkbox/14.04/0.4
* debian/watch: look for new releases on launchpad
* debian/rules: stop using pybuild and use manage.py
  {i18n,build,install,validate} instead. This also drops dependency on
  python3-distutils-extra and replaces that with intltool
* debian/control: drop X-Python3-Version
* debian/control: make plainbox-provider-checkbox depend on python and
  python2.7 (for some scripts) rather than suggesting them.
* debian/upstream/signing-key.asc: Use armoured gpg keys to avoid having to
  keep binary files in Debian packaging. Also, replace that with my key
  since I made the 0.3 release upstream.
* debian/source/lintian-overrides: add an override for warning about no
  source for flash movie with reference to a bug report that discusses that
  issue.
* debian/source/include-binaries: drop (no longer needed)
* debian/patches: drop (no longer needed)
* debian/plainbox-provider-checkbox.lintian-overrides: drop (no longer
  needed)
* Stop being a python3 module, move to from DPMT to PAPT

Show diffs side-by-side

added added

removed removed

Lines of Context:
 
1
#!/usr/bin/python3
 
2
import errno
 
3
import io
 
4
import sys
 
5
 
 
6
from argparse import ArgumentParser, FileType
 
7
 
 
8
VALID_XML_CHARS = frozenset([0x9, 0xA, 0xD] +
 
9
                            list(range(0x20, 0xD7FF)) +
 
10
                            list(range(0xE000, 0xFFFD)) +
 
11
                            list(range(0x10000, 0x10FFFF)))
 
12
 
 
13
 
 
14
def is_valid_xml_char(ch):
 
15
    # Is this character valid in XML?
 
16
    # http://www.w3.org/TR/xml/#charsets
 
17
    return ord(ch) in VALID_XML_CHARS
 
18
 
 
19
 
 
20
def main():
 
21
    parser = ArgumentParser("Receives as input some text and outputs "
 
22
                            "the same text without characters which are "
 
23
                            "not valid in the XML specification.")
 
24
    parser.add_argument('input_file',
 
25
                        type=FileType('r'),
 
26
                        nargs='?',
 
27
                        help='The name of the file to sanitize.')
 
28
    args = parser.parse_args()
 
29
 
 
30
    if args.input_file:
 
31
        text = ''.join([c for c in args.input_file.read() if
 
32
                       is_valid_xml_char(c)])
 
33
 
 
34
    else:
 
35
        with io.TextIOWrapper(
 
36
                sys.stdin.buffer, encoding='UTF-8', errors="ignore") as stdin:
 
37
            text = ''.join([c for c in stdin.read() if is_valid_xml_char(c)])
 
38
 
 
39
    print(text)
 
40
 
 
41
if __name__ == "__main__":
 
42
    try:
 
43
        sys.exit(main())
 
44
    except Exception as err:
 
45
        if err.errno != errno.EPIPE:
 
46
            raise(err)