ll.url
– RFC 2396 compliant URLs
ll.url
contains an RFC 2396 compliant implementation of URLs and
classes for accessing resource metadata as well as file like classes for
reading and writing resource data.
These three levels of functionality are implemented in three classes:
URL
URL
objects are the names of resources and can be used and modified, regardless of the fact whether these resources actually exist.URL
objects never hits the hard drive or the net.Connection
Connection
objects contain functionality that accesses and changes file metadata (like last modified date, permission bits, directory structure etc.). A connection object can be created by calling theconnect()
method on aURL
object.Resource
Resource
objects are file like objects that work with the actual bytes that make up the file data. This functionality lives in theResource
class and its subclasses. Creating a resource is done by calling theopen()
method on aConnection
or aURL
.
Module documentation
- ll.url.httpdate(dt)[source]
Return a string suitable for a “Last-Modified” and “Expires” header.
dt
is adatetime.datetime
object in UTC.
- class ll.url.Context[source]
Bases:
object
Working with URLs (e.g. calling
URL.open()
orURL.connect()
) involvesConnection
objects. To avoid constantly creating new connections you can pass aContext
object to those methods. Connections will be stored in theContext
object and will be reused by those methods.A
Context
object can also be used as a context manager. This context object will be used for allopen()
andconnect()
calls inside thewith
block. (Note that after the end of thewith
block all connections will be closed.)- closeall()[source]
Close and drop all connections in this context.
- class ll.url.Cursor[source]
Bases:
object
A
Cursor
object is used by thewalk()
method during directory traversal. It contains information about the state of the traversal and can be used to influence which directories are traversed and in which order.Information about the state of the traversal is provided in the following attributes:
rooturl
The URL where traversal has been started (i.e. the object for which the
walk()
method has been called)url
The current URL being traversed.
event
A string that specifies which event is currently handled. Possible values are:
"beforedir"
,"afterdir"
and"file"
. A"beforedir"
event is emitted before a directory is entered."afterdir"
is emitted after a directory has been entered."file"
is emitted when a file is encountered.isdir
True if
url
refers to a directory.isfile
Tur if
url
refers to a regular file.
The following attributes specify which part of the tree should be traversed:
beforedir
Should the generator yield
"beforedir"
events?afterdir
Should the generator yield
"afterdir"
events?file
Should the generator yield
"file"
events?enterdir
Should the directory be entered?
Note that if any of these attributes is changed by the code consuming the generator, this new value will be used for the next traversal step once the generator is resumed and will be reset to its initial value (specified in the constructor) afterwards.
- __init__(url, beforedir=True, afterdir=False, file=True, enterdir=False)[source]
Create a new
Cursor
object for a tree traversal rooted at the nodenode
.The arguments
beforedir
,afterdir
,file
andenterdir
are used as the initial values for the attributes of the same name. (see the class docstring for info about their use).
- restore()[source]
Restore the attributes
beforedir
,afterdir
,file
andenterdir
to their initial value.
- class ll.url.Connection[source]
Bases:
object
A
Connection
object is used for accessing and modifying the metadata associated with a file. It is created by calling theconnect()
method on aURL
object.- lstat(url)[source]
Return the result of a
stat()
call on the fileurl
. Likestat()
, but does not follow symbolic links.
- chmod(url, mode)[source]
Set the access mode of the file
url
tomode
.
- chown(url, owner=None, group=None)[source]
Change the owner and/or group of the file
url
.
- lchown(url, owner=None, group=None)[source]
Change the owner and/or group of the file
url
(ignoring symbolic links).
- uid(url)[source]
Return the user id of the owner of the file
url
.
- gid(url)[source]
Return the group id the file
url
belongs to.
- owner(url)[source]
Return the name of the owner of the file
url
.
- group(url)[source]
Return the name of the group the file
url
belongs to.
- mimetype(url)[source]
Return the mimetype of the file
url
.
- exists(url)[source]
Test whether the file
url
exists.
- isfile(url)[source]
Test whether the resource
url
is a file.
- isdir(url)[source]
Test whether the resource
url
is a directory.
- islink(url)[source]
Test whether the resource
url
is a link.
- ismount(url)[source]
Test whether the resource
url
is a mount point.
- access(url, mode)[source]
Test for access to the file/resource
url
.
- size(url)[source]
Return the size of the file
url
.
- imagesize(url)[source]
Return the size of the image
url
(if the resource is an image file) as a(width, height)
tuple. This requires the PIL.
- cdate(url)[source]
Return the “metadate change” date of the file/resource
url
as adatetime.datetime
object in UTC.
- adate(url)[source]
Return the last access date of the file/resource
url
as adatetime.datetime
object in UTC.
- mdate(url)[source]
Return the last modification date of the file/resource
url
as adatetime.datetime
object in UTC.
- resheaders(url)[source]
Return the MIME headers for the file/resource
url
.
- remove(url)[source]
Remove the file
url
.
- rmdir(url)[source]
Remove the directory
url
.
- rename(url, target)[source]
Renames
url
totarget
. This might not work iftarget
has a different scheme thanurl
(or is on a different server).
- link(url, target)[source]
Create a hard link from
url
totarget
. This will not work iftarget
has a different scheme thanurl
(or is on a different server).
- symlink(url, target)[source]
Create a symbolic link from
url
totarget
. This will not work iftarget
has a different scheme thanurl
(or is on a different server).
- chdir(url)[source]
Change the current directory to
url
.
- mkdir(url, mode=511)[source]
Create the directory
url
.
- makedirs(url, mode=511)[source]
Create the directory
url
and all intermediate ones.
- walk(url, beforedir=True, afterdir=False, file=True, enterdir=True)[source]
Return an iterator for traversing the directory hierarchy rooted at the directory
url
.Each item produced by the iterator is a
Cursor
object. It contains information about the state of the traversal and can be used to influence which parts of the directory hierarchy are traversed and in which order.The arguments
beforedir
,afterdir
,file
andenterdir
specify how the directory hierarchy should be traversed. For more information see theCursor
class.Note that the
Cursor
object is reused bywalk()
, so you can’t rely on any attributes remaining the same across calls tonext()
.The following example shows how to traverse the current directory, print all files except those in certain directories:
from ll import url for cursor in url.here().walk(beforedir=True, afterdir=False, file=True): if cursor.isdir: if cursor.url.path[-2] in (".git", "build", "dist", "__pycache__"): cursor.enterdir = False else: print(cursor.url)
- listdir(url, include=None, exclude=None, ignorecase=False)[source]
Iterates over items in the directory
url
. The items produced areURL
objects relative tourl
.With the optional
include
argument, this only lists items whose names match the given pattern. Items matching the optional patternexclude
will not be listed.include
andexclude
can be strings (which will be interpreted asfnmatch
style filename patterns) or lists of strings. Ifignorecase
is true case-insensitive name matching will be performed.
- files(url, include=None, exclude=None, ignorecase=False)[source]
Iterates over files in the directory
url
. The items produced areURL
objects relative tourl
.With the optional
include
argument, this only lists files whose names match the given pattern. Files matching the optional patternexclude
will not be listed.include
andexclude
can be strings (which will be interpreted asfnmatch
style filename patterns) or lists of strings. Ifignorecase
is true case-insensitive name matching will be performed.
- dirs(url, include=None, exclude=None, ignorecase=False)[source]
Iterates over directories in the directory
url
. The items produced areURL
objects relative tourl
.With the optional
include
argument, this only directories items whose names match the given pattern. Directories matching the optional patternexclude
will not be listed.include
andexclude
can be strings (which will be interpreted asfnmatch
style filename patterns) or lists of strings. Ifignorecase
is true case-insensitive name matching will be performed.
- walkall(url, include=None, exclude=None, enterdir=None, skipdir=None, ignorecase=False)[source]
Recursively iterate over files and subdirectories. The iterator yields
URL
objects naming each child URL of the directoryurl
and its descendants relative tourl
. This performs a depth-first traversal, returning each directory before all its children.With the optional
include
argument, only yield items whose names match the given pattern. Items matching the optional patternexclude
will not be listed. Directories that don’t match the optional patternenterdir
or match the patternskipdir
will not be traversed.include
,exclude
,enterdir
andskipdir
can be strings (which will be interpreted asfnmatch
style filename patterns) or lists of strings. Ifignorecase
is true case-insensitive name matching will be performed.
- walkfiles(url, include=None, exclude=None, enterdir=None, skipdir=None, ignorecase=False)[source]
Return a recursive iterator over files in the directory
url
.With the optional
include
argument, only yield files whose names match the given pattern. Files matching the optional patternexclude
will not be listed. Directories that don’t match the optional patternenterdir
or match the patternskipdir
will not be traversed.include
,exclude
,enterdir
andskipdir
can be strings (which will be interpreted asfnmatch
style filename patterns) or lists of strings. Ifignorecase
is true case-insensitive name matching will be performed.
- walkdirs(url, include=None, exclude=None, enterdir=None, skipdir=None, ignorecase=False)[source]
Return a recursive iterator over subdirectories in the directory
url
.With the optional
include
argument, only yield directories whose names match the given pattern. Items matching the optional patternexclude
will not be listed. Directories that don’t match the optional patternenterdir
or match the patternskipdir
will not be traversed.include
,exclude
,enterdir
andskipdir
can be strings (which will be interpreted asfnmatch
style filename patterns) or lists of strings. Ifignorecase
is true case-insensitive name matching will be performed.
- open(url, *args, **kwargs)[source]
Open
url
for reading or writing.open()
returns aResource
object.Which additional parameters are supported depends on the actual resource created. Some common parameters are:
mode
str
A string indicating how the file is to be opened (just like the mode argument for the builtin
open()
(e.g."rb"
or"wb"
).headers
dict
Additional headers to use for an HTTP request.
data
bytes
Request body to use for an HTTP POST request.
python
str
orNone
Name of the Python interpreter to use on the remote side (used by
ssh
URLs)nice
int
orNone
Nice level for the remote python (used by
ssh
URLs)check
bool
orNone
Whether
ssh
host keys should be checked (used byssh
URLs where it defaults toTrue
andssh-nocheck
URLs where it defaults toFalse
).
- class ll.url.LocalConnection[source]
Bases:
Connection
A
LocalConnection
object is used for accessing and modifying the metadata associated with a file in the local filesystem. It is created by calling theconnect()
method on aURL
object with no scheme or thefile
orroot
scheme.
- class ll.url.SshConnection[source]
Bases:
Connection
A
SshConnection
object is used for accessing and modifying the metadata associated with a file on a remote filesystem. Remote files will be accessed via code executed remotely on the target host viaexecnet
.SshConnection
objects are created by calling theconnect()
method on aURL
object with thessh
orssh-nocheck
scheme.Note
Using the scheme
ssh-nocheck
disables checks of the host key, i.e. it passes-o "StrictHostKeyChecking=no"
to the underlyingssh
command.If you need to use further options (e.g. when your
known_hosts
file isn’t writable), you should configure that in your~/.ssh/config
file, for example:Host foo Hostname foo.example.org StrictHostKeyChecking no UserKnownHostsfile /dev/null
or for Windows:
Host foo Hostname foo.example.org StrictHostKeyChecking no UserKnownHostsfile nul:
- class ll.url.URLConnection[source]
Bases:
Connection
A
URLConnection
object is used for accessing and modifying the metadata associated any other resource specified by a URL (except those handled by the otherConnection
subclasses).
- ll.url.home(user='', scheme='file')[source]
Return the home directory of the current user (or the user named
user
, ifuser
is specified) as anURL
object:>>> url.home() URL('file:/home/walter/') >>> url.home("andreas") URL('file:/home/andreas/')
- ll.url.File(name, scheme='file')[source]
Turn a filename into an
URL
object:>>> url.File("a#b") URL('file:a%23b')
- ll.url.Dir(name, scheme='file')[source]
Turns a directory name into an
URL
object, just likeFile()
, but ensures that the path is terminated with a/
:>>> url.Dir("a#b") URL('file:a%23b/')
- ll.url.Ssh(user, host, path='~/')[source]
Return a ssh
URL
for the useruser
on the hosthost
with the pathpath
.path
(defaulting to the users home directory) must be a path in URL notation (i.e. use/
as directory separator):>>> url.Ssh("root", "www.example.com", "~joe/public_html/index.html") URL('ssh://root@www.example.com/~joe/public_html/index.html')
If the path starts with
~/
it is relative to this users home directory, if it starts with~user
it’s relative to the home directory of the useruser
. In all other cases the path is considered to be absolute.
- ll.url.first(urls)[source]
Return the first URL from
urls
that exists as a real file or directory.None
entries inurls
will be skipped.
- ll.url.firstdir(urls)[source]
Return the first URL from
urls
that exists as a real directory.None
entries inurls
will be skipped.
- ll.url.firstfile(urls)[source]
Return the first URL from
urls
that exists as a real file.None
entries inurls
will be skipped.
- class ll.url.Resource[source]
Bases:
object
A
Resource
is a base class that provides a file-like interface to local and remote files, URLs and other resources.Each resource object has the following attributes:
In addition to file methods (like
read()
,readlines()
,write()
andclose()
) a resource object might provide the following methods:finalurl()
Return the real URL of the resource (this might be different from the
url
attribute in case of a redirect).size()
Return the size of the file/resource.
mdate()
Return the last modification date of the file/resource as a
datetime.datetime
object in UTC.mimetype()
Return the mimetype of the file/resource.
imagesize()
Return the size of the image (if the resource is an image file) as a
(width, height)
tuple. This requires the PIL.
- class ll.url.RemoteFileResource[source]
Bases:
Resource
A subclass of
Resource
that handles remote files (i.e. those using thessh
scheme).
- class ll.url.URLResource[source]
Bases:
Resource
A subclass of
Resource
that handles HTTP, FTP and other URLs (i.e. those that are not handled byFileResource
orRemoteFileResource
.
- class ll.url.SchemeDefinition[source]
Bases:
object
A
SchemeDefinition
instance defines the properties of a particular URL scheme.- __init__(scheme, usehierarchy, useserver, usefrag, islocal=False, isremote=False, defaultport=None)[source]
Create a new
SchemeDefinition
instance. Arguments are:scheme
: The name of the scheme;usehierarchy
: Specifies whether this scheme uses hierarchical URLs or opaque URLs (i.e. whetherhier_part
oropaque_part
from the BNF in RFC 2396 is used);useserver
: Specifies whether this scheme uses an Internet-based serverauthority
component or a registry of naming authorities (only for hierarchical URLs);usefrag
: Specifies whether this scheme uses fragments (according to the BNF in RFC 2396 every scheme does, but it doesn’t make sense for e.g."javascript"
,"mailto"
or"tel"
);islocal
: Specifies whether URLs with this scheme refer to local files;isremote
: Specifies whether URLs with this scheme refer to remote files (there may be schemes which are neither local nor remote, e.g."mailto"
);defaultport
: The default port for this scheme (only for schemes using server based authority).
- connect(url, context=None, **kwargs)[source]
Create a
Connection
for theURL
url
(which must haveself
as the scheme).
- closeall(context)[source]
Close all connections active for this scheme in the context
context
.
- class ll.url.URL[source]
Bases:
object
An RFC 2396 compliant URL.
- __init__(url=None)[source]
Create a new
URL
instance.url
may be astr
object, or anURL
(in which case you’ll get a copy ofurl
), orNone
(which will create anURL
referring to the “current document”).
- clone()[source]
Return an identical copy
self
.
- withfile(file)[source]
Return a new
URL
where the filename (i.e. the name of last component ofpath_segments
) has been replaced withfile
.
- __truediv__(other)[source]
Join
self
with another (possible relative)URL
other
, to form a newURL
.other
may be astr
orURL
object. It may beNone
(referring to the “current document”) in which caseself
will be returned. It may also be a list or other iterable. For this case a list (or iterator) will be returned where__div__()
will be applied to every item in the list/iterator. E.g. the following expression returns all the files in the current directory as absolute URLs (see the methodfiles()
and the functionhere()
for further explanations):>>> here = url.here() >>> for f in here/here.files(): ... print(f)
- __rtruediv__(other)[source]
Right hand version of
__div__()
. This supports lists and iterables as the left hand side too.
- relative(baseurl, allowschemerel=False)[source]
Return an relative
URL
rel
such thatbaseurl/rel == self
, i.e. this is the inverse operation of__div__()
.If
self
is relative, has a differentscheme
orauthority
thanbaseurl
or a non-hierarchical scheme, an identical copy ofself
will be returned.If
allowschemerel
is true, scheme relative URLs are allowed, i.e. if bothself
andbaseurl
use the same hierarchical scheme, but a different authority (i.e. server), a scheme relative url (//server/path/file.html
) will be returned.
- __bool__()[source]
Return whether the
URL
is not empty, i.e. whether it is not theURL
referring to the start of the current document.
- __eq__(other)[source]
Return whether two
URL
objects are equal. Note that only properties relevant for the current scheme will be compared.
- __hash__()[source]
Return a hash value for
self
, to be able to useURL
objects as dictionary keys. You must be careful not to modify anURL
as soon as you use it as a dictionary key.
- abs(scheme=-1)[source]
Return an absolute version of
self
(works only for local URLs).If the argument
scheme
is specified, it will be used for the resulting URL otherwise the result will have the same scheme asself
.
- real(scheme=-1)[source]
Return the canonical version of
self
, eliminating all symbolic links (works only for local URLs).If the argument
scheme
is specified, it will be used for the resulting URL otherwise the result will have the same scheme asself
.
- islocal()[source]
Return whether
self
refers to a local file, i.e. whetherself
is a relativeURL
or the scheme isroot
orfile
).
- local()[source]
Return
self
as a local filename (which will only works ifself
is local (seeislocal()
).
- connect(context=None, **kwargs)[source]
Return a
Connection
object for accessing and modifying the metadata ofself
.Whether you get a new connection object, or an existing one depends on the scheme, the URL itself, and the context passed in (as the
context
argument).
- open(*args, **kwargs)[source]
Open
self
for reading or writing.open()
returns aResource
object.Which additional parameters are supported depends on the actual resource created. Some common parameters are:
mode
(supported by all resources)A string indicating how the file is to be opened (just like the mode argument for the builtin
open()
; e.g."rb"
or"wb"
).context
(supported by all resources)open()
needs aConnection
for this URL which it gets from aContext
object.headers
Additional headers to use for an HTTP request.
data
Request body to use for an HTTP POST request.
python
Name of the Python interpreter to use on the remote side (used by
ssh
URLs)nice
Nice level for the remove python (used by
ssh
URLs)
- import_(name=None)[source]
Import the content of the URL
self
as a Python module.name
can be used the specify the module name (i.e. the__name__
attribute of the module). The default determines it from the URL.
Special features of ll.url
The class ll.url.URL
supports many common schemes and one additional
special scheme named root
that deserves an explanation.
A root
URL is supposed to be an URL that is relative to a “project”
directory instead to a base URL of the document that contains the URL.
Suppose we have a document with the following base URL:
>>> from ll import url
>>> base = url.URL("root:company/it/about/index.html")
Now, if we have the following relative URL in this document:
>>> url1 = url.URL("images/logos/spam.png")
the combined URL will be:
>>> base/url1
URL('root:company/it/about/images/logos/spam.png')
Now it we use this combined URL and interpret it relative to the base URL we get back our original relative URL:
>>> (base/url1).relative(base)
URL('images/logos/spam.png')
Let’s try a root
URL now:
>>> url2 = url.URL("root:images/logos/spam.png")
Combining this URL with the base URL gives us the same as url2
:
>>> base/url2
URL('root:images/logos/spam.png')
But if we interpret this result relative to base
, we’ll get:
>>> (base/url2).relative(base)
URL('../../../images/logos/spam.png')
I.e. this gives us a relative URL that references url2
from base
when
both URLs are relative to the same root directory.