Miscellaneous features
======================
URLs
----
For URL handling XIST uses the module :mod:`ll.url`. Refer to its documentation
for the basic functionality (especially regarding the methods
:meth:`~ll.url.URL.__div__` and and :meth:`ll.url.URL.relative`.
When XIST parses an XML resource it uses a so called "base" URL.
This base URL can be passed to all parsing functions. If it isn't specified
it defaults to the URL of the resource being parsed. This base URL will
be prepended to all URLs that are read during parsing:
.. sourcecode:: pycon
>>> from ll.xist import parse
>>> from ll.xist.ns import html
>>> node = parse.parsestring('
', base="root:spam/index.html")
>>> print node.string()
For publishing a base URL can be specified too. URLs will be published
relative to this base URL with the exception of relative URLs in the tree.
This means:
* When you have a relative URL (e.g. ``#top``) generated by a :meth:`convert`
call, this URL will stay the same when publishing.
* Base URLs for parsing should never be relative: Relative base URLs will be
prepended to all relative URLs in the file, but this will not be reverted for
publishing. In most cases the base URL should be a ``root`` URL when you parse
local files.
* When you parse remote web pages you can either omit the :obj:`base` argument,
so it will default to the URL being parsing, so that links, images, etc. on
the page will still point back to their original location, or you might want
to use the empty URL ``URL()`` as the base, so you'll get all URLs in the
page as they are.
* When XIST is used as a compiler for static pages, you're going to read source
XML files, do a conversion and write the result to a new target file.
In this case you should probably use the URL of the target file for both
parsing and publishing. Let's assume we have an URL ``#top`` in the source
file. When we use the "real" file names for parsing and publishing like this:
.. sourcecode:: python
node = parse.parsefile("spam.htmlxsc", base="root:spam.htmlxsc")
node = node.conv()
node.write(open("spam.html", "wb"), base="root:spam.html")
the following will happen: The URL ``#top`` will be parsed as
``root:spam.htmlxsc#top``. After conversion this will be written to
:file:`spam.html` relative to the URL ``root:spam.html``, which results
in ``spam.html#top``, which works, but is not what you want.
When you use ``root:spam.html`` both for parsing and publishing, ``#top``
will be written to the target file as expected.
Pretty printing XML
-------------------
The method :meth:`pretty` can be used for pretty printing XML. It returns a
new version of the node, with additional white space between the elements:
.. sourcecode:: python
from ll.xist.ns import html
node = html.html(
html.head(
html.title("foo"),
),
html.body(
html.div(
html.h1("The ", html.em("foo"), " page!"),
html.p("Welcome to the ", html.em("foo"), " page."),
),
),
)
print node.pretty().bytes()
This will print:
.. sourcecode:: xml
Welcome to the foo page.