ll.url – RFC 2396 compliant URLs

ll.url contains an RFC 2396 compliant implementation of URLs and classes for accessing resource metadata as well as file like classes for reading and writing resource data.

These three levels of functionality are implemented in three classes:

URL
URL objects are the names of resources and can be used and modified, regardless of the fact whether these resources actually exits. URL objects never hits the hard drive or the net.
Connection
Connection objects contain functionality that accesses and changes file metadata (like last modified date, permission bits, directory structure etc.). A connection object can be created by calling the connect() method on a URL object.
Resource
Resource objects are file like objects that work with the actual bytes that make up the file data. This functionality lives in the Resource class and its subclasses. Creating a resource is done by calling the open() method on a Connection or a URL.
class ll.url.Connection

Bases: object

A Connection object is used for accessing and modifying the metadata associated with a file. It is created by calling the connect() method on a URL object.

access

Test for access to the file/resource url.

adate

Return the last access date of the file/resource url as a datetime.datetime object in UTC.

cdate

Return the “metadate change” date of the file/resource url as a datetime.datetime object in UTC.

chdir

Change the current directory to url.

chmod

Set the access mode of the file url to mode.

chown

Change the owner and/or group of the file url.

dirs

Iterates over directories in the directory url. The items produced are URL objects relative to url.

With the optional include argument, this only directories items whose names match the given pattern. Directories matching the optional pattern exclude will not be listed. include and exclude can be strings (which will be interpreted as fnmatch style filename patterns) or lists of strings. If ignorecase is true case-insensitive name matching will be performed.

exists

Test whether the file url exists.

files

Iterates over files in the directory url. The items produced are URL objects relative to url.

With the optional include argument, this only lists files whose names match the given pattern. Files matching the optional pattern exclude will not be listed. include and exclude can be strings (which will be interpreted as fnmatch style filename patterns) or lists of strings. If ignorecase is true case-insensitive name matching will be performed.

gid

Return the group id the file url belongs to.

group

Return the name of the group the file url belongs to.

imagesize

Return the size of the image url (if the resource is an image file) as a (width, height) tuple. This requires the PIL.

isdir

Test whether the resource url is a directory.

isfile

Test whether the resource url is a file.

Test whether the resource url is a link.

ismount

Test whether the resource url is a mount point.

lchown

Change the owner and/or group of the file url (ignoring symbolic links).

Create a hard link from url to target. This will not work if target has a different scheme than url (or is on a different server).

listdir

Iterates over items in the directory url. The items produced are URL objects relative to url.

With the optional include argument, this only lists items whose names match the given pattern. Items matching the optional pattern exclude will not be listed. include and exclude can be strings (which will be interpreted as fnmatch style filename patterns) or lists of strings. If ignorecase is true case-insensitive name matching will be performed.

lstat

Return the result of a stat() call on the file url. Like stat(), but does not follow symbolic links.

makedirs

Create the directory url and all intermediate ones.

mdate

Return the last modification date of the file/resource url as a datetime.datetime object in UTC.

mimetype

Return the mimetype of the file url.

mkdir

Create the directory url.

open

Open url for reading or writing. open() returns a Resource object.

Which additional parameters are supported depends on the actual resource created. Some common parameters are:

mode : string
A string indicating how the file is to be opened (just like the mode argument for the builtin open() (e.g. "rb" or "wb").
headers : mapping
Additional headers to use for an HTTP request.
data : byte string
Request body to use for an HTTP POST request.
python : string or None
Name of the Python interpreter to use on the remote side (used by ssh URLs)
nice : int or None
Nice level for the remote python (used by ssh URLs)
owner

Return the name of the owner of the file url.

remove

Remove the file url.

rename

Renames url to target. This might not work if target has a different scheme than url (or is on a different server).

resheaders

Return the MIME headers for the file/resource url.

rmdir

Remove the directory url.

size

Return the size of the file url.

stat

Return the result of a stat() call on the file url.

Create a symbolic link from url to target. This will not work if target has a different scheme than url (or is on a different server).

uid

Return the user id of the owner of the file url.

walk

Return an iterator for traversing the directory hierarchy rooted at the directory url.

Each item produced by the iterator is a Cursor object. It contains information about the state of the traversal and can be used to influence which parts of the directory hierarchy are traversed and in which order.

The arguments beforedir, afterdir, file and enterdir specify how the directory hierarchy should be traversed. For more information see the Cursor class.

Note that the Cursor object is reused by walk(), so you can’t rely on any attributes remaining the same across calls to next().

The following example shows how to traverse the current directory, print all files except those in certain directories:

from ll import url

for cursor in url.here().walk(beforedir=True, afterdir=False, file=True):
        if cursor.isdir:
                if cursor.url.path[-2] in (".git", "build", "dist", "__pycache__"):
                        cursor.enterdir = False
        else:
                print(cursor.url)
walkall

Recursively iterate over files and subdirectories. The iterator yields URL objects naming each child URL of the directory url and its descendants relative to url. This performs a depth-first traversal, returning each directory before all its children.

With the optional include argument, only yield items whose names match the given pattern. Items matching the optional pattern exclude will not be listed. Directories that don’t match the optional pattern enterdir or match the pattern skipdir will not be traversed. include, exclude, enterdir and skipdir can be strings (which will be interpreted as fnmatch style filename patterns) or lists of strings. If ignorecase is true case-insensitive name matching will be performed.

walkdirs

Return a recursive iterator over subdirectories in the directory url.

With the optional include argument, only yield directories whose names match the given pattern. Items matching the optional pattern exclude will not be listed. Directories that don’t match the optional pattern enterdir or match the pattern skipdir will not be traversed. include, exclude, enterdir and skipdir can be strings (which will be interpreted as fnmatch style filename patterns) or lists of strings. If ignorecase is true case-insensitive name matching will be performed.

walkfiles

Return a recursive iterator over files in the directory url.

With the optional include argument, only yield files whose names match the given pattern. Files matching the optional pattern exclude will not be listed. Directories that don’t match the optional pattern enterdir or match the pattern skipdir will not be traversed. include, exclude, enterdir and skipdir can be strings (which will be interpreted as fnmatch style filename patterns) or lists of strings. If ignorecase is true case-insensitive name matching will be performed.

class ll.url.Context

Bases: object

Working with URLs (e.g. calling URL.open() or URL.connect()) involves Connection objects. To avoid constantly creating new connections you can pass a Context object to those methods. Connections will be stored in the Context object and will be reused by those methods.

A Context object can also be used as a context manager. This context object will be used for all open() and connect() calls inside the with block. (Note that after the end of the with block all connections will be closed.)

closeall

Close and drop all connections in this context.

class ll.url.Cursor

Bases: object

A Cursor object is used by the walk() method during directory traversal. It contains information about the state of the traversal and can be used to influence which directories are traversed and in which order.

Information about the state of the traversal is provided in the following attributes:

rooturl
The URL where traversal has been started (i.e. the object for which the walk() method has been called)
url
The current URL being traversed.
event
A string that specifies which event is currently handled. Possible values are: "beforedir", "afterdir" and "file". A "beforedir" event is emitted before a directory is entered. "afterdir" is emitted after a directory has been entered. "file" is emitted when a file is encountered.
isdir
True if url refers to a directory.
isfile
Tur if url refers to a regular file.

The following attributes specify which part of the tree should be traversed:

beforedir
Should the generator yield "beforedir" events?
afterdir
Should the generator yield "afterdir" events?
file
Should the generator yield "file" events?
enterdir
Should the directory be entered?

Note that if any of these attributes is changed by the code consuming the generator, this new value will be used for the next traversal step once the generator is resumed and will be reset to its initial value (specified in the constructor) afterwards.

__init__

Create a new Cursor object for a tree traversal rooted at the node node.

The arguments beforedir, afterdir, file and enterdir are used as the initial values for the attributes of the same name. (see the class docstring for info about their use).

restore

Restore the attributes beforedir, afterdir, file and enterdir to their initial value.

ll.url.Dir

Turns a directory name into an URL object, just like File(), but ensures that the path is terminated with a /:

>>> url.Dir("a#b")
URL('file:a%23b/')
ll.url.File

Turn a filename into an URL object:

>>> url.File("a#b")
URL('file:a%23b')
class ll.url.FileResource

Bases: ll.url.Resource

A subclass of Resource that handles local files.

class ll.url.LocalConnection

Bases: ll.url.Connection

class ll.url.LocalSchemeDefinition

Bases: ll.url.SchemeDefinition

class ll.url.Query

Bases: dict

class ll.url.RemoteFileResource

Bases: ll.url.Resource

A subclass of Resource that handles remote files (those using the ssh scheme).

class ll.url.Resource

Bases: object

A Resource is a base class that provides a file-like interface to local and remote files, URLs and other resources.

Each resource object has the following attributes:

url
The URL for which this resource has been opened (i.e. foo.open().url is foo if foo is a URL object);
name
A string version of url;
closed
A bool specifying whether the resource has been closed (i.e. whether the close() method has been called).

In addition to file methods (like read(), readlines(), write() and close()) a resource object might provide the following methods:

finalurl()
Return the real URL of the resource (this might be different from the url attribute in case of a redirect).
size()
Return the size of the file/resource.
mdate()
Return the last modification date of the file/resource as a datetime.datetime object in UTC.
mimetype()
Return the mimetype of the file/resource.
imagesize()
Return the size of the image (if the resource is an image file) as a (width, height) tuple. This requires the PIL.
class ll.url.SchemeDefinition

Bases: object

A SchemeDefinition instance defines the properties of a particular URL scheme.

__init__

Create a new SchemeDefinition instance. Arguments are:

  • scheme: The name of the scheme;
  • usehierarchy: Specifies whether this scheme uses hierarchical URLs or opaque URLs (i.e. whether hier_part or opaque_part from the BNF in RFC 2396 is used);
  • useserver: Specifies whether this scheme uses an Internet-based server authority component or a registry of naming authorities (only for hierarchical URLs);
  • usefrag: Specifies whether this scheme uses fragments (according to the BNF in RFC 2396 every scheme does, but it doesn’t make sense for e.g. "javascript", "mailto" or "tel");
  • islocal: Specifies whether URLs with this scheme refer to local files;
  • isremote: Specifies whether URLs with this scheme refer to remote files (there may be schemes which are neither local nor remote, e.g. "mailto");
  • defaultport: The default port for this scheme (only for schemes using server based authority).
closeall

Close all connections active for this scheme in the context context.

connect

Create a Connection for the URL url (which must have self as the scheme).

ll.url.Ssh

Return a ssh URL for the user user on the host host with the path path.:obj:path (defaulting to the users home directory) must be a path in URL notation (i.e. use / as directory separator):

>>> url.Ssh("root", "www.example.com", "~joe/public_html/index.html")
URL('ssh://root@www.example.com/~joe/public_html/index.html')

If the path starts with ~/ it is relative to this users home directory, if it starts with ~user it’s relative to the home directory of the user user. In all othercases the path is considered to be absolute.

class ll.url.SshConnection

Bases: ll.url.Connection

class ll.url.SshSchemeDefinition

Bases: ll.url.SchemeDefinition

class ll.url.ThreadLocalContext

Bases: _thread._local

class ll.url.URL

Bases: object

An RFC 2396 compliant URL.

__bool__

Return whether the URL is not empty, i.e. whether it is not the URL referring to the start of the current document.

__eq__

Return whether two URL objects are equal. Note that only properties relevant for the current scheme will be compared.

__hash__

Return a hash value for self, to be able to use URL objects as dictionary keys. You must be careful not to modify an URL as soon as you use it as a dictionary key.

__init__

Create a new URL instance. url may be a str object, or an URL (in which case you’ll get a copy of url), or None (which will create an URL referring to the “current document”).

__ne__

Return whether two URL objects are not equal.

__rtruediv__

Right hand version of __div__(). This supports lists and iterables as the left hand side too.

__truediv__

Join self with another (possible relative) URL other, to form a new URL.

other may be a str or URL object. It may be None (referring to the “current document”) in which case self will be returned. It may also be a list or other iterable. For this case a list (or iterator) will be returned where __div__() will be applied to every item in the list/iterator. E.g. the following expression returns all the files in the current directory as absolute URLs (see the method files() and the function here() for further explanations):

>>> here = url.here()
>>> for f in here/here.files():
...     print(f)
abs

Return an absolute version of self (works only for local URLs).

If the argument scheme is specified, it will be used for the resulting URL otherwise the result will have the same scheme as self.

clone

Return an identical copy self.

connect

Return a Connection object for accessing and modifying the metadata of self.

Whether you get a new connection object, or an existing one depends on the scheme, the URL itself, and the context passed in (as the context argument).

import_

Import the content of the URL self as a Python module.

name can be used the specify the module name (i.e. the __name__ attribute of the module). The default determines it from the URL.

islocal

Return whether self refers to a local file, i.e. whether self is a relative URL or the scheme is root or file).

local

Return self as a local filename (which will only works if self is local (see islocal()).

open

Open self for reading or writing. open() returns a Resource object.

Which additional parameters are supported depends on the actual resource created. Some common parameters are:

mode (supported by all resources)
A string indicating how the file is to be opened (just like the mode argument for the builtin open(); e.g. "rb" or "wb").
context (supported by all resources)
open() needs a Connection for this URL which it gets from a Context object.
headers
Additional headers to use for an HTTP request.
data
Request body to use for an HTTP POST request.
python
Name of the Python interpreter to use on the remote side (used by ssh URLs)
nice
Nice level for the remove python (used by ssh URLs)
real

Return the canonical version of self, eliminating all symbolic links (works only for local URLs).

If the argument scheme is specified, it will be used for the resulting URL otherwise the result will have the same scheme as self.

relative

Return an relative URL rel such that baseurl/rel == self, i.e. this is the inverse operation of __div__().

If self is relative, has a different scheme or authority than baseurl or a non-hierarchical scheme, an identical copy of self will be returned.

If allowschemerel is true, scheme relative URLs are allowed, i.e. if both self and baseurl use the same hierarchical scheme, both a different authority (i.e. server), a scheme relative url (//server/path/file.html) will be returned.

withext

Return a new URL where the filename extension has been replaced with ext.

withfile

Return a new URL where the filename (i.e. the name of last component of path_segments) has been replaced with file.

withfrag

Return a new URL where the fragment has been replaced with frag.

withoutext

Return a new URL where the filename extension has been removed.

withoutfrag

Return a new URL where the frag has been dropped.

class ll.url.URLConnection

Bases: ll.url.Connection

class ll.url.URLResource

Bases: ll.url.Resource

A subclass of Resource that handles HTTP, FTP and other URLs (i.e. those that are not handled by FileResource or RemoteFileResource.

ll.url.first

Return the first URL from urls that exists as a real file or directory. None entries in urls will be skipped.

ll.url.firstdir

Return the first URL from urls that exists as a real directory. None entries in urls will be skipped.

ll.url.firstfile

Return the first URL from urls that exists as a real file. None entries in urls will be skipped.

ll.url.here

Return the current directory as an URL object.

ll.url.home

Return the home directory of the current user (or the user named user, if user is specified) as an URL object:

>>> url.home()
URL('file:/home/walter/')
>>> url.home("andreas")
URL('file:/home/andreas/')
ll.url.httpdate

Return a string suitable for a “Last-Modified” and “Expires” header.

dt is a datetime.datetime object in UTC.

ll.url.root

Return a blank root URL, i.e. URL("root:").

Special features of ll.url

The class ll.url.URL supports many common schemes and one additional special scheme named root that deserves an explanation.

A root URL is supposed to be an URL that is relative to a “project” directory instead to a base URL of the document that contains the URL.

Suppose we have a document with the following base URL:

>>> from ll import url
>>> base = url.URL("root:company/it/about/index.html")

Now, if we have the following relative URL in this document:

>>> url1 = url.URL("images/logos/spam.png")

the combined URL will be:

>>> base/url1
URL('root:company/it/about/images/logos/spam.png')

Now it we use this combined URL and interpret it relative to the base URL we get back our original relative URL:

>>> (base/url1).relative(base)
URL('images/logos/spam.png')

Let’s try a root URL now:

>>> url2 = url.URL("root:images/logos/spam.png")

Combining this URL with the base URL gives us the same as url2:

>>> base/url2
URL('root:images/logos/spam.png')

But if we interpret this result relative to base, we’ll get:

>>> (base/url2).relative(base)
URL('../../../images/logos/spam.png')

I.e. this gives us a relative URL that references url2 from base when both URLs are relative to the same root directory.