xfind – Tree traversal and filtering
This module contains XFind selectors and related classes and functions.
A selector specifies a condition that a node in an XIST tree must satisfy to
match the selector. For example the method Node.walk() will only output
nodes that match the specified selector.
Selectors can be combined with various operations and form a language comparable to XPath but implemented as Python expressions.
- ll.xist.xfind.filter(iter, *selectors)[source]
Filter an iterator over
xsc.Cursorobjects against aSelectorobject.Example:
>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> [c.node.string() for c in xfind.filter(doc.walk(), html.b, html.title)] [ '<title>Welcome to Python.org</title>', '<b>Web Programming</b>', '<b>GUI Development</b>', '<b>Scientific and Numeric</b>', '<b>Software Development</b>', '<b>System Administration</b>' ]
- ll.xist.xfind.selector(*objs)[source]
Create a
Selectorobject fromobjs.If
objsis empty (i.e.selector()is called without arguments)anyis returned (which matches every node).If more than one argument is passed (or the argument is a tuple), an
OrCombinatoris returned.Otherwise the following steps are taken for the single argument
obj:if
objalready is aSelectorobject it is returned unchanged;if
objis aNodesubclass, anIsInstanceSelectoris returned (which matches if the node is an instance of this class);if
objis aNodeinstance, anIsSelectoris returned (which matches onlyobj);if
objis callable aCallableSelectoris returned (where matching is done by callingobj);if
objisNoneanywill be returned;otherwise
selector()will raise aTypeError.
- class ll.xist.xfind.Selector[source]
Bases:
objectA selector specifies a condition that a node in an XIST tree must satisfy to match the selector.
Whether a node matches the selector can be specified by overwriting the
__contains__()method. Selectors can be combined with various operations (see methods below).- __contains__(path)[source]
Return whether
path(which is a list of XIST nodes from the root of the tree to the node in question) matches the selector.
- __truediv__(other)[source]
Create a
ChildCombinatorwithselfas the left hand selector andotheras the right hand selector.
- __rtruediv__(other)[source]
Create a
ChildCombinatorwithotheras the left hand selector andselfas the right hand selector.
- __floordiv__(other)[source]
Create a
DescendantCombinatorwithselfas the left hand selector andotheras the right hand selector.
- __rfloordiv__(other)[source]
Create a
DescendantCombinatorwithotheras the left hand selector andselfas the right hand selector.
- __mul__(other)[source]
Create an
AdjacentSiblingCombinatorwithselfas the left hand selector andotheras the right hand selector.
- __rmul__(other)[source]
Create an
AdjacentSiblingCombinatorwithotheras the left hand selector andselfas the right hand selector.
- __pow__(other)[source]
Create a
GeneralSiblingCombinatorwithselfas the left hand selector andotheras the right hand selector.
- __rpow__(other)[source]
Create a
GeneralSiblingCombinatorwithotheras the left hand selector andselfas the right hand selector.
- __and__(other)[source]
Create an
AndCombinatorfromselfandother.
- __rand__(other)[source]
Create an
AndCombinatorfromotherandself.
- __or__(other)[source]
Create an
OrCombinatorfromselfandother.
- __ror__(other)[source]
Create an
OrCombinatorfromotherandself.
- __invert__()[source]
Create a
NotCombinatorinvertingself.
- class ll.xist.xfind.AnySelector[source]
Bases:
SelectorSelector that selects all nodes.
An instance of this class named
anyis created as a module global, i.e. you can usexfind.any.
- class ll.xist.xfind.IsInstanceSelector[source]
Bases:
SelectorSelector that selects all nodes that are instances of the specified type. You can either create an
IsInstanceSelectorobject directly or simply pass a class to a function that expects a selector (this class will be automatically wrapped in anIsInstanceSelector):>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(html.a): ... print(node.attrs.href, node.attrs.title) ... https://www.python.org/#content Skip to content https://www.python.org/#python-network https://www.python.org/ The Python Programming Language https://www.python.org/psf-landing/ The Python Software Foundation ...
- class ll.xist.xfind.element[source]
Bases:
SelectorSelector that selects all elements that have a specified namespace name and element name:
>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.element(html, "img")): ... print(node.string()) ... <img alt="python™" class="python-logo" src="https://www.python.org/static/img/python-logo.png" />
- class ll.xist.xfind.procinst[source]
Bases:
SelectorSelector that selects all processing instructions that have a specified name.
- class ll.xist.xfind.entity[source]
Bases:
SelectorSelector that selects all entities that have a specified name.
- class ll.xist.xfind.IsSelector[source]
Bases:
SelectorSelector that selects one specific node in the tree. This can be combined with other selectors via
ChildCombinatororDescendantCombinatorselectors to select children of this specific node. You can either create anIsSelectordirectly or simply pass a node to a function that expects a selector:>>> from ll.xist import xsc, parse >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(doc[0]/xsc.Element): ... print(repr(node)) ... <element ll.xist.ns.html.head xmlns='http://www.w3.org/1999/xhtml' (89 children/no attrs) location='https://www.python.org/:?:?' at 0x104ad7630> <element ll.xist.ns.html.body xmlns='http://www.w3.org/1999/xhtml' (14 children/2 attrs) location='https://www.python.org/:?:?' at 0x104cc1f28>
- class ll.xist.xfind.IsRootSelector[source]
Bases:
SelectorSelector that selects the node that is the root of the traversal.
An instance of this class named
isrootis created as a module global, i.e. you can usexfind.isroot.
- class ll.xist.xfind.IsEmptySelector[source]
Bases:
SelectorSelector that selects all empty elements or fragments.
An instance of this class named
emptyis created as a module global, i.e. you can usexfind.empty:>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.empty): ... print(node.string()) ... <meta charset="utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <link href="https://ajax.googleapis.com/" rel="prefetch" /> <meta name="application-name" content="Python.org" /> ...
- class ll.xist.xfind.OnlyChildSelector[source]
Bases:
SelectorSelector that selects all nodes that are the only child of their parents.
An instance of this class named
onlychildis created as a module global, i.e. you can usexfind.onlychild:>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.onlychild & html.a): ... print(node.string()) ... <a class="text-shrink" href="javascript:;" title="Make Text Smaller">Smaller</a> <a class="text-grow" href="javascript:;" title="Make Text Larger">Larger</a> <a class="text-reset" href="javascript:;" title="Reset any font size changes I have made">Reset</a> <a href="http://plus.google.com/+Python"><span aria-hidden="true" class="icon-google-plus"></span>Google+</a> ...
- class ll.xist.xfind.OnlyOfTypeSelector[source]
Bases:
SelectorSelector that selects all nodes that are the only nodes of their type among their siblings.
An instance of this class named
onlyoftypeis created as a module global, i.e. you can usexfind.onlyoftype:>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.onlyoftype & xsc.Element): ... print(repr(node)) ... <element ll.xist.ns.html.html xmlns='http://www.w3.org/1999/xhtml' (7 children/3 attrs) location='https://www.python.org/:?:?' at 0x108858d30> <element ll.xist.ns.html.head xmlns='http://www.w3.org/1999/xhtml' (89 children/no attrs) location='https://www.python.org/:?:?' at 0x108858630> <element ll.xist.ns.html.title xmlns='http://www.w3.org/1999/xhtml' (1 child/no attrs) location='https://www.python.org/:?:?' at 0x108c547b8> <element ll.xist.ns.html.body xmlns='http://www.w3.org/1999/xhtml' (14 children/2 attrs) location='https://www.python.org/:?:?' at 0x108c54eb8> ...
- class ll.xist.xfind.hasattr[source]
Bases:
SelectorSelector that selects all element nodes that have an attribute with one of the specified names. (Names can be strings, (attribute name, namespace name) tuples or attribute classes or instances):
>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.hasattr("id")): ... print(node.xmlname, node.attrs.id) ... body homepage div touchnav-wrapper div top a close-python-network ...
- class ll.xist.xfind.attrhasvalue[source]
Bases:
SelectorSelector that selects all element nodes where an attribute with the specified name has one of the specified values. (Names can be strings, (attribute name, namespace name) tuples or attribute classes or instances). Note that “fancy” attributes (i.e. those containing non-text) will not be considered:
>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.attrhasvalue("rel", "stylesheet")): ... print(node.attrs.href) ... https://www.python.org/static/stylesheets/style.css https://www.python.org/static/stylesheets/mq.css
- class ll.xist.xfind.attrcontains[source]
Bases:
SelectorSelector that selects all element nodes where an attribute with the specified name contains one of the specified substrings in its value. (Names can be strings, (attribute name, namespace name) tuples or attribute classes or instances). Note that “fancy” attributes (i.e. those containing non-text) will not be considered:
>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.attrcontains("rel", "stylesheet")): ... print(node.attrs.rel, node.attrs.href) ... stylesheet https://www.python.org/static/stylesheets/style.css stylesheet https://www.python.org/static/stylesheets/mq.css
- class ll.xist.xfind.attrstartswith[source]
Bases:
SelectorSelector that selects all element nodes where an attribute with the specified name starts with any of the specified strings. (Names can be strings, (attribute name, namespace name) tuples or attribute classes or instances). Note that “fancy” attributes (i.e. those containing non-text) will not be considered:
>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.attrstartswith("class", "icon-")): ... print(node.bytes()) ... b'<span aria-hidden="true" class="icon-arrow-down"><span>\xe2\x96\xbc</span></span>' b'<span aria-hidden="true" class="icon-arrow-up"><span>\xe2\x96\xb2</span></span>' b'<span aria-hidden="true" class="icon-search"></span>' b'<span aria-hidden="true" class="icon-facebook"></span>' ...
- class ll.xist.xfind.attrendswith[source]
Bases:
SelectorSelector that selects all element nodes where an attribute with the specified name ends with one of the specified strings. (Names can be strings, (attribute name, namespace name) tuples or attribute classes or instances). Note that “fancy” attributes (i.e. those containing non-text) will not be considered:
>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.attrendswith("href", ".css")): ... print(node.attrs.href) ... https://www.python.org/static/stylesheets/style.css https://www.python.org/static/stylesheets/mq.css
- class ll.xist.xfind.hasid[source]
Bases:
SelectorSelector that selects all element nodes where the
idattribute has one if the specified values:>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.hasid("id-search-field")): ... print(node.string()) ... <input class="search-field" id="id-search-field" name="q" placeholder="Search" role="textbox" tabindex="1" type="search" />
- class ll.xist.xfind.hasclass[source]
Bases:
SelectorSelector that selects all element nodes where the
classattribute contains one of the specified values:>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.hasclass("tier-1")/html.a): ... print(node.string()) ... A A Socialize Sign In About Downloads ...
- class ll.xist.xfind.InAttrSelector[source]
Bases:
SelectorSelector that selects all attribute nodes and nodes inside of attributes:
>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for path in doc.walkpaths(xfind.inattr & xsc.Text, enterattrs=True, enterattr=True): ... print(path[-3].xmlname, path[-2].xmlname, path[-1].string()) ... html class no-js html dir ltr html lang en meta charset utf-8 meta content IE=edge meta http-equiv X-UA-Compatible ...
- class ll.xist.xfind.Combinator[source]
Bases:
SelectorA
Combinatoris a selector that transforms one or combines two or more other selectors in a certain way.
- class ll.xist.xfind.BinaryCombinator[source]
Bases:
CombinatorA
BinaryCombinatoris a combinator that combines two selector: the left hand selector and the right hand selector.
- class ll.xist.xfind.ChildCombinator[source]
Bases:
BinaryCombinatorA
ChildCombinatoris aBinaryCombinator. To match theChildCombinatorthe node must match the right hand selector and its immediate parent must match the left hand selector (i.e. it works similar to the>combinator in CSS or the/combinator in XPath).ChildCombinatorobjects can be created via the division operator (/):>>> from ll.xist import xsc, parse >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(html.a/html.img): ... print(node.string()) ... <img alt="python™" class="python-logo" src="https://www.python.org/static/img/python-logo.png" />
- class ll.xist.xfind.DescendantCombinator[source]
Bases:
BinaryCombinatorA
DescendantCombinatoris aBinaryCombinator. To match theDescendantCombinatorthe node must match the right hand selector and any of its ancestor nodes must match the left hand selector (i.e. it works similar to the descendant combinator in CSS or the//combinator in XPath).DescendantCombinatorobjects can be created via the floor division operator (//):>>> from ll.xist import xsc, parse >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(html.div//html.img): ... print(node.string()) ... <img alt="python™" class="python-logo" src="https://www.python.org/static/img/python-logo.png" />
- class ll.xist.xfind.AdjacentSiblingCombinator[source]
Bases:
BinaryCombinatorA
AdjacentSiblingCombinatoris aBinaryCombinator. To match theAdjacentSiblingCombinatorthe node must match the right hand selector and the immediately preceding sibling must match the left hand selector.AdjacentSiblingCombinatorobjects can be created via the multiplication operator (*). The following example outputs allspanelements that immediately follow aformelement:>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(html.form*html.span): ... print(node.string()) ... <span class="breaker"></span>
- class ll.xist.xfind.GeneralSiblingCombinator[source]
Bases:
BinaryCombinatorA
GeneralSiblingCombinatoris aBinaryCombinator. To match theGeneralSiblingCombinatorthe node must match the right hand selector and any of the preceding siblings must match the left hand selector.AdjacentSiblingCombinatorobjects can be created via the exponentiation operator (**). The following example outputs allmetaelements that come after alinkelements:>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(html.link**html.meta): ... print(node.string()) ... <meta name="application-name" content="Python.org" /> <meta name="msapplication-tooltip" content="The official home of the Python Programming Language" /> <meta name="apple-mobile-web-app-title" content="Python.org" /> <meta name="apple-mobile-web-app-capable" content="yes" /> <meta name="apple-mobile-web-app-status-bar-style" content="black" /> ...
- class ll.xist.xfind.ChainedCombinator[source]
Bases:
CombinatorA
ChainedCombinatorcombines any number of other selectors.
- class ll.xist.xfind.OrCombinator[source]
Bases:
ChainedCombinatorAn
OrCombinatoris aChainedCombinatorwhere the node must match at least one of the selectors to match theOrCombinator. AnOrCombinatorcan be created with the binary or operator (|):>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.hasattr("href") | xfind.hasattr("src")): ... print(node.attrs.href if "href" in node.Attrs else node.attrs.src) ... https://ajax.googleapis.com/ https://www.python.org/static/js/libs/modernizr.js https://www.python.org/static/stylesheets/style.css https://www.python.org/static/stylesheets/mq.css https://www.python.org/static/favicon.ico ...
- class ll.xist.xfind.AndCombinator[source]
Bases:
ChainedCombinatorAn
AndCombinatoris aChainedCombinatorwhere the node must match all of the combined selectors to match theAndCombinator. AnAndCombinatorcan be created with the binary and operator (&):>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(html.input & xfind.hasattr("id")): ... print(node.string()) ... <input class="search-field" id="id-search-field" name="q" placeholder="Search" role="textbox" tabindex="1" type="search" />
- class ll.xist.xfind.NotCombinator[source]
Bases:
CombinatorA
NotCombinatorinverts the selection logic of the underlying selector, i.e. a node matches only if it does not match the underlying selector. ANotCombinatorcan be created with the unary inversion operator (~).The following example outputs all internal scripts:
>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(html.script & ~xfind.hasattr("src")): ... print(node.string()) ... <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-39055973-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> <script>window.jQuery || document.write('<script src="/static/js/libs/jquery-1.8.2.min.js"><\/script>')</script>
- class ll.xist.xfind.CallableSelector[source]
Bases:
SelectorA
CallableSelectoris a selector that calls a user specified callable to select nodes. The callable gets passed the path and must return a bool specifying whether this path is selected. ACallableSelectoris created implicitely whenever a callable is passed to a method that expects a selector.The following example outputs all links that point outside the
python.orgdomain:>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> def isextlink(path): ... return isinstance(path[-1], html.a) and not str(path[-1].attrs.href).startswith("https://www.python.org") ... >>> for node in doc.walknodes(isextlink): ... print(node.string()) ... <a href="http://docs.python.org/" title="Python Documentation">Docs</a> <a href="https://pypi.python.org/" title="Python Package Index">PyPI</a> <a class="text-shrink" href="javascript:;" title="Make Text Smaller">Smaller</a> <a class="text-grow" href="javascript:;" title="Make Text Larger">Larger</a> ..
- class ll.xist.xfind.nthchild[source]
Bases:
SelectorAn
nthchildobject is a selector that selects every node that is the n-th child of its parent. E.g.nthchild(0)selects every first child,nthchild(-1)selects each last child. Furthermorenthchild("even")selects each first, third, fifth, … child andnthchild("odd")selects each second, fourth, sixth, … child.
- class ll.xist.xfind.nthoftype[source]
Bases:
SelectorAn
nthoftypeobject is a selector that selects every node that is the n-th node of a specified type among its siblings. Similar tonthchildnthoftypesupports negative and positive indices as well as"even"and"odd". Which types are checked can be passed explicitly. If no types are passed the type of the node itself is used:>>> from ll.xist import xsc, parse, xfind >>> from ll.xist.ns import xml, html, chars >>> doc = parse.tree( ... parse.URL("https://www.python.org/"), ... parse.Tidy(), ... parse.NS(html), ... parse.Node(pool=xsc.Pool(xml, html, chars)) ... ) >>> for node in doc.walknodes(xfind.nthoftype(0, html.h2)): ... print(node.string()) ... <h2 class="widget-title"><span aria-hidden="true" class="icon-get-started"></span>Get Started</h2> <h2 class="widget-title"><span aria-hidden="true" class="icon-download"></span>Download</h2> <h2 class="widget-title"><span aria-hidden="true" class="icon-documentation"></span>Docs</h2> <h2 class="widget-title"><span aria-hidden="true" class="icon-jobs"></span>Jobs</h2> ...