ll.ul4on – Object serialization

This module provides functions for encoding and decoding a lightweight text-based format for serializing the object types supported by UL4.

It is extensible to allow encoding/decoding arbitrary instances (i.e. it is basically a reimplementation of pickle, but with string input/output instead of bytes and with an eye towards cross-plattform support).

There are implementations for Python (this module), Java and Javascript (as part of the UL4 packages for those languages).

Furthermore there’s an Oracle package that can be used for generating UL4ON encoded data.

Basic usage follows the API design of pickle, json, etc. and supports most builtin Python types:

>>> from ll import ul4on
>>> ul4on.dumps(None)
'n'
>>> ul4on.loads('n')
>>> ul4on.dumps(False)
'bF'
>>> ul4on.loads('bF')
False
>>> ul4on.dumps(42)
'i42'
>>> ul4on.loads('i42')
42
>>> ul4on.dumps(42.5)
'f42.5'
>>> ul4on.loads('f42.5')
42.5
>>> ul4on.dumps('foo')
"S'foo'"
>>> ul4on.loads("S'foo'")
'foo'

date, datetime and timedelta objects are supported too:

>>> import datetime
>>> ul4on.dumps(datetime.date.today())
'X i2014 i11 i3'
>>> ul4on.dumps(datetime.datetime.now())
'Z i2014 i11 i3 i18 i16 i45 i314157'
>>> ul4on.loads('X i2014 i11 i3')
datetime.date(2014, 11, 3)
>>> ul4on.loads('Z i2014 i11 i3 i18 i16 i45 i314157')
datetime.datetime(2014, 11, 3, 18, 16, 45, 314157)
>>> ul4on.dumps(datetime.timedelta(days=1))
'T i1 i0 i0'
>>> ul4on.loads('T i1 i0 i0')
datetime.timedelta(1)

ll.ul4on also supports Color objects from ll.color:

>>> from ll import color
>>> ul4on.dumps(color.red)
'C i255 i0 i0 i255'
>>> ul4on.loads('C i255 i0 i0 i255')
Color(0xff, 0x00, 0x00)

Lists, dictionaries and sets are also supported:

>>> ul4on.dumps([1, 2, 3])
'L i1 i2 i3 ]'
>>> ul4on.loads('L i1 i2 i3 ]')
[1, 2, 3]
>>> ul4on.dumps(dict(one=1, two=2))
"D S'two' i2 S'one' i1 }"
>>> ul4on.loads("D S'two' i2 S'one' i1 }")
{'one': 1, 'two': 2}
>>> ul4on.dumps({1, 2, 3})
'Y i1 i2 i3 }'
>>> ul4on.loads('Y i1 i2 i3 }')
{1, 2, 3}

Recursive data structures

ll.ul4on can also handle recursive data structures:

>>> r = []
>>> r.append(r)
>>> ul4on.dumps(r)
'L ^0 ]'
>>> r2 = ul4on.loads('L ^0 ]')
>>> r2
[[...]]
>>> r2 is r2[0]
True
>>> r = {}
>>> r['recursive'] = r
>>> ul4on.dumps(r)
"D S'recursive' ^0 }"
>>> r2 = ul4on.loads("D S'recursive' ^0 }")
>>> r2
{'recursive': {...}}
>>> r2['recursive'] is r2
True

Note

The ^0 part in the dump is a so called “back reference”, it tells the decoder that in this spot an object is referenced that has already been part of the dump (The 0 indicates where in the dump the object can be found).

Extensibility

UL4ON is extensible. It supports serializing arbitrary instances by registering the class with the UL4ON serialization machinery:

from ll import ul4on

@ul4on.register("com.example.person")
class Person:
   def __init__(self, firstname=None, lastname=None):
      self.firstname = firstname
      self.lastname = lastname

   def __repr__(self):
      return f"<Person firstname={self.firstname!r} lastname={self.lastname!r}>"

   def ul4ondump(self, encoder):
      encoder.dump(self.firstname)
      encoder.dump(self.lastname)

   def ul4onload(self, decoder):
      self.firstname = decoder.load()
      self.lastname = decoder.load()

jd = Person("John", "Doe")
output = ul4on.dumps(jd)
print("Dump:", output)
jd2 = ul4on.loads(output)
print("Loaded:", jd2)

This script outputs:

Dump: O S'com.example.person' S'John' S'Doe' )
Loaded: <Person firstname='John' lastname='Doe'>

It is also possible to pass a custom registry to load() and loads():

from ll import ul4on

class Person:
   ul4onname = "com.example.person"

   def __init__(self, firstname=None, lastname=None):
      self.firstname = firstname
      self.lastname = lastname

   def __repr__(self):
      return f"<Person firstname={self.firstname!r} lastname={self.lastname!r}>"

   def ul4ondump(self, encoder):
      encoder.dump(self.firstname)
      encoder.dump(self.lastname)

   def ul4onload(self, decoder):
      self.firstname = decoder.load()
      self.lastname = decoder.load()

jd = Person("John", "Doe")
output = ul4on.dumps(jd)
print("Dump:", output)
jd2 = ul4on.loads(output, {"com.example.person": Person})
print("Loaded:", jd2)

Any type name not found in the registry dict passed in will be looked up in the global registry.

Note

If a class isn’t registered with the UL4ON serialization machinery, you have to set the class attribute ul4onname yourself for serialization to work.

For deserialization the class must be registered either in the local registry passed to the Decoder or globally via register().

Object content mismatch

In situations where an UL4ON API is updated frequently it is useful to be able to update the writing side and the reading side independently. To support this, Decoder has a method loadcontent() that is a generator that reads the content items of an object from the input stream and yields those items.

This allows to handle both situations:

  • When the writing side outputs more items that the reading side expects, exhausting the iterator returned by loadcontent() will read and ignore the unrecognized items and leave the input stream in a consistent state.

  • When the writing side outputs less items then the reading side expects, the remaining items can by initialized with default values.

For our example class it could be used like this:

from ll import ul4on

class Person:
   ul4onname = "com.example.person"

   def __init__(self, firstname=None, lastname=None):
      self.firstname = firstname
      self.lastname = lastname

   def __repr__(self):
      return f"<Person firstname={self.firstname!r} lastname={self.lastname!r}>"

   def ul4ondump(self, encoder):
      encoder.dump(self.firstname)
      encoder.dump(self.lastname)

   def ul4onload(self, decoder):
      index = -1
      for (index, item) in enumerate(decoder.loadcontent()):
         if index == 0:
            self.firstname = item
         elif index == 1:
            self.lastname = item
      # Initialize attributes that were not loaded by ``loadcontent``
      if index < 1:
         self.lastname = None
         if index < 0:
            self.firstname = None

output = """o s'com.example.person' s'John' )"""
j = ul4on.loads(output, {"com.example.person": Person})
print("Loaded:", j)

This outputs:

Loaded: <Person firstname='John' lastname=None>

Chunked UL4ON

ll.ul4on also provides access to the classes that implement UL4ON encoding and decoding. This can be used to create multiple UL4ON dumps using the same encoding context, or recreate multiple objects from those multiple UL4ON dumps (using the same decoding context).

An example for encoding:

encoder = ul4on.Encoder()
obj = "spam"
print(encoder.dumps(obj))
print(encoder.dumps(obj))

This prints:

S'spam'
^0

The second call outputs a back reference, since the encoder remembers that the string "spam" has already been output.

An example for decoding:

decoder = ul4on.Decoder()
print(decoder.loads("S'spam'"))
print(decoder.loads("^0"))

This prints:

spam
spam

since the decoder remembers which object has been decoded as the first object from the first dump.

One application of this is embedding multiple related UL4ON dumps as data attributes in HTML and then deserializing those UL4ON chuncks back into the appropriate Javascript objects. For example:

from ll import ul4on
from ll.misc import xmlencode as xe

encoder = ul4on.Encoder()

counter = 0

def dump(obj):
   global counter
   counter += 1
   return f"{counter} {encoder.dumps(obj)}"

data = ["gurk", "hurz", "hinz", "kunz"]

def f(s):
   return f"<li data-ul4on='{xe(dump(s))}'>{xe(s.upper())}</li>"

items = "\n".join(f(s) for s in data)
html = f"<ul data-ul4on='{xe(dump(data))}'>\n{items}\n</ul>"
print(html)

This outputs:

<ul data-ul4on='5 L ^0 ^1 ^2 ^3 ]'>
<li data-ul4on='1 S&#39;gurk&#39;'>GURK</li>
<li data-ul4on='2 S&#39;hurz&#39;'>HURZ</li>
<li data-ul4on='3 S&#39;hinz&#39;'>HINZ</li>
<li data-ul4on='4 S&#39;kunz&#39;'>KUNZ</li>
</ul>

By iterating through the data-ul4on attributes in the correct order and feeding each UL4ON chunk to the same decoder, all objects can be recreated and attached to their appropriate HTML elements.

Incremental UL4ON and persistent objects

Objects that have an attribute ul4onid are considered “persistent” objects. The combination of ul4onname and ul4onid uniquely identifies each persistent object (even across multiple unrelated UL4ON dumps).

An Encoder will dump those objects differently than other objects without an ul4onid attribute.

A Decoder will remember all persistent objects it has loaded (under their ul4onname and ul4onid). If the decoder encounters the ul4onname and ul4onid of an object it has remembered, it will not create a new object, instead ul4onload() will be called for the existing object. If the decoder encounters a persistent objects it hasn’t remembered, it will create a new object (passing the ul4onid as the only argument to the constructor) and then call ul4onload() on the new object.

This means that with this approach it’s possible to use one Decoder object to load multiple unrelated UL4ON dumps “incrementally” one after the other, but still merge the persistent objects in the subsequent dumps into the those created by previous dumps.

Note

For persistent objects ul4onload() and ul4ondump() don’t have the dump/load the ul4onid attribute, as this is done by the Encoder/Decoder.

Note

If the value of the attribute ul4onid is None the object will be treated as an “ordinary” (i.e. non-persistent) object.

Note

For this approach, the method reset() must be called between calls to load() or loads() to reset the information about back references.

Module documentation

ll.ul4on.register(name: str)[source]

This decorator can be used to register the decorated class with the ll.ul4on serialization machinery.

name must be a globally unique name for the class. To avoid name collisions Java’s class naming system should be used (i.e. an inverted domain name like com.example.foo.bar).

name will be stored in the class attribute ul4onname.

class ll.ul4on.Encoder[source]

Bases: object

An Encoder is used for serializing an object into an UL4ON dump.

It manages the internal state required for handling backreferences and other stuff.

__init__(indent: str = None)[source]

Create an encoder for serializing objects.

When indent is not None, it is used as an indentation string for pretty printing the output.

dumps(obj: Any) str[source]

Serialize obj and return the resulting dump as a string.

dump(obj: Any, stream: TextIO | None = None) None[source]

Serialize obj into the stream stream as an UL4ON formatted dump.

stream must provide a write() method.

Passing None for stream may only be done by objects that call dump() to implement UL4ON serialization in their own ul4ondump() method.

class ll.ul4on.Decoder[source]

Bases: object

A Decoder is used for deserializing an UL4ON dump.

It manages the internal state required for handling backreferences, persistent objects and other stuff.

__init__(registry: Dict[str, Callable[[...], Any]] | None = None)[source]

Create a decoder for deserializing objects from an UL4ON dump.

registry is used as a “custom type registry”. It must map UL4ON type names to callables that create new empty instances of those types. Any type not found in registry will be looked up in the global registry (see register()).

loads(dump: str) Any[source]

Deserialize the object in the string dump and return it.

load(stream: TextIO | None = None) Any[source]

Deserialize the next object from the stream stream and return it.

stream must provide a read() method.

Passing None for stream may only be done by objects that call load() to implement UL4ON deserialization in their own ul4onload() method.

loadcontent() Generator[Any, None, None][source]

Load the content of an object until the “object terminator” is encountered.

This is a generator and might produce fewer or more items than expected. The caller must be able to handle both cases (e.g. by ignoring additional items or initializing missing items with a default value).

The iterator should always be exhausted when it is read, otherwise the stream will be in an undefined state.

loadcontentitems() Generator[Tuple[str, Any], None, None][source]

Similar to loadcontent(), but will load the content of an object as (key, value) pairs.

For further info see loadcontent().

reset() None[source]

Clear the internal cache for backreferences so that a new unrelated UL4ON dump can be loaded.

However the cache for persistent objects will not be cleared.

store_persistent_object(object) None[source]

Add a persistent object to the cache of persistent objects.

forget_persistent_object(object) None[source]

Remove a persistent object from the cache of persistent objects.

persistent_object(name: str, id: str) Any[source]

Return the persistent object with the type name and the id id, or None, when the decoder hasn’t encountered that object yet.

persistent_objects() ValuesView[Any][source]

Return an iterator over all persistent objects the decoder has encountered so far.

ll.ul4on.dumps(obj: Any, /, indent: str | None = None) str[source]

Serialize obj as an UL4ON formatted string.

ll.ul4on.dump(obj: Any, /, stream: TextIO, indent: str | None = None) None[source]

Serialize obj as an UL4ON formatted stream to stream.

stream must provide a write() method.

ll.ul4on.load(stream: TextIO, /, registry: Dict[str, Callable[[...], Any]] | None = None) Any[source]

Deserialize stream (which must be file-like object with a read() method containing an UL4ON formatted object) to a Python object.

For the meaning of registry see Decoder.__init__().

ll.ul4on.loads(dump: str, /, registry: Dict[str, Callable[[...], Any]] | None = None) Any[source]

Deserialize dump (which must be a string containing an UL4ON formatted object) to a Python object.

For the meaning of registry see Decoder.__init__().

ll.ul4on.loadclob(clob, /, bufsize: int = 1048576, registry: Dict[str, Callable[[...], Any]] | None = None) Any[source]

Deserialize clob (which must be an cx_Oracle CLOB variable containing an UL4ON formatted object) to a Python object.

bufsize specifies the chunk size for reading the underlying CLOB object.

For the meaning of registry see Decoder.__init__().