ll.ul4on
– Object serialization
This module provides functions for encoding and decoding a lightweight text-based format for serializing the object types supported by UL4.
It is extensible to allow encoding/decoding arbitrary instances (i.e. it is
basically a reimplementation of pickle
, but with string input/output
instead of bytes and with an eye towards cross-plattform support).
There are implementations for Python (this module), Java and Javascript (as part of the UL4 packages for those languages).
Furthermore there’s an Oracle package that can be used for generating UL4ON encoded data.
Basic usage follows the API design of pickle
, json
, etc. and
supports most builtin Python types:
>>> from ll import ul4on
>>> ul4on.dumps(None)
'n'
>>> ul4on.loads('n')
>>> ul4on.dumps(False)
'bF'
>>> ul4on.loads('bF')
False
>>> ul4on.dumps(42)
'i42'
>>> ul4on.loads('i42')
42
>>> ul4on.dumps(42.5)
'f42.5'
>>> ul4on.loads('f42.5')
42.5
>>> ul4on.dumps('foo')
"S'foo'"
>>> ul4on.loads("S'foo'")
'foo'
date
, datetime
and
timedelta
objects are supported too:
>>> import datetime
>>> ul4on.dumps(datetime.date.today())
'X i2014 i11 i3'
>>> ul4on.dumps(datetime.datetime.now())
'Z i2014 i11 i3 i18 i16 i45 i314157'
>>> ul4on.loads('X i2014 i11 i3')
datetime.date(2014, 11, 3)
>>> ul4on.loads('Z i2014 i11 i3 i18 i16 i45 i314157')
datetime.datetime(2014, 11, 3, 18, 16, 45, 314157)
>>> ul4on.dumps(datetime.timedelta(days=1))
'T i1 i0 i0'
>>> ul4on.loads('T i1 i0 i0')
datetime.timedelta(1)
ll.ul4on
also supports Color
objects from
ll.color
:
>>> from ll import color
>>> ul4on.dumps(color.red)
'C i255 i0 i0 i255'
>>> ul4on.loads('C i255 i0 i0 i255')
Color(0xff, 0x00, 0x00)
Lists, dictionaries and sets are also supported:
>>> ul4on.dumps([1, 2, 3])
'L i1 i2 i3 ]'
>>> ul4on.loads('L i1 i2 i3 ]')
[1, 2, 3]
>>> ul4on.dumps(dict(one=1, two=2))
"D S'two' i2 S'one' i1 }"
>>> ul4on.loads("D S'two' i2 S'one' i1 }")
{'one': 1, 'two': 2}
>>> ul4on.dumps({1, 2, 3})
'Y i1 i2 i3 }'
>>> ul4on.loads('Y i1 i2 i3 }')
{1, 2, 3}
Recursive data structures
ll.ul4on
can also handle recursive data structures:
>>> r = []
>>> r.append(r)
>>> ul4on.dumps(r)
'L ^0 ]'
>>> r2 = ul4on.loads('L ^0 ]')
>>> r2
[[...]]
>>> r2 is r2[0]
True
>>> r = {}
>>> r['recursive'] = r
>>> ul4on.dumps(r)
"D S'recursive' ^0 }"
>>> r2 = ul4on.loads("D S'recursive' ^0 }")
>>> r2
{'recursive': {...}}
>>> r2['recursive'] is r2
True
Note
The ^0
part in the dump is a so called “back reference”, it tells the
decoder that in this spot an object is referenced that has already been part
of the dump (The 0
indicates where in the dump the object can be found).
Extensibility
UL4ON is extensible. It supports serializing arbitrary instances by registering the class with the UL4ON serialization machinery:
from ll import ul4on
@ul4on.register("com.example.person")
class Person:
def __init__(self, firstname=None, lastname=None):
self.firstname = firstname
self.lastname = lastname
def __repr__(self):
return f"<Person firstname={self.firstname!r} lastname={self.lastname!r}>"
def ul4ondump(self, encoder):
encoder.dump(self.firstname)
encoder.dump(self.lastname)
def ul4onload(self, decoder):
self.firstname = decoder.load()
self.lastname = decoder.load()
jd = Person("John", "Doe")
output = ul4on.dumps(jd)
print("Dump:", output)
jd2 = ul4on.loads(output)
print("Loaded:", jd2)
This script outputs:
Dump: O S'com.example.person' S'John' S'Doe' )
Loaded: <Person firstname='John' lastname='Doe'>
It is also possible to pass a custom registry to load()
and
loads()
:
from ll import ul4on
class Person:
ul4onname = "com.example.person"
def __init__(self, firstname=None, lastname=None):
self.firstname = firstname
self.lastname = lastname
def __repr__(self):
return f"<Person firstname={self.firstname!r} lastname={self.lastname!r}>"
def ul4ondump(self, encoder):
encoder.dump(self.firstname)
encoder.dump(self.lastname)
def ul4onload(self, decoder):
self.firstname = decoder.load()
self.lastname = decoder.load()
jd = Person("John", "Doe")
output = ul4on.dumps(jd)
print("Dump:", output)
jd2 = ul4on.loads(output, {"com.example.person": Person})
print("Loaded:", jd2)
Any type name not found in the registry dict passed in will be looked up in the global registry.
Note
If a class isn’t registered with the UL4ON serialization machinery, you have
to set the class attribute ul4onname
yourself for serialization to work.
For deserialization the class must be registered either in the local
registry passed to the Decoder
or globally via register()
.
Object content mismatch
In situations where an UL4ON API is updated frequently it is useful to be able
to update the writing side and the reading side independently. To support this,
Decoder
has a method loadcontent()
that is a generator
that reads the content items of an object from the input stream and yields those
items.
This allows to handle both situations:
When the writing side outputs more items that the reading side expects, exhausting the iterator returned by
loadcontent()
will read and ignore the unrecognized items and leave the input stream in a consistent state.When the writing side outputs less items then the reading side expects, the remaining items can by initialized with default values.
For our example class it could be used like this:
from ll import ul4on
class Person:
ul4onname = "com.example.person"
def __init__(self, firstname=None, lastname=None):
self.firstname = firstname
self.lastname = lastname
def __repr__(self):
return f"<Person firstname={self.firstname!r} lastname={self.lastname!r}>"
def ul4ondump(self, encoder):
encoder.dump(self.firstname)
encoder.dump(self.lastname)
def ul4onload(self, decoder):
index = -1
for (index, item) in enumerate(decoder.loadcontent()):
if index == 0:
self.firstname = item
elif index == 1:
self.lastname = item
# Initialize attributes that were not loaded by ``loadcontent``
if index < 1:
self.lastname = None
if index < 0:
self.firstname = None
output = """o s'com.example.person' s'John' )"""
j = ul4on.loads(output, {"com.example.person": Person})
print("Loaded:", j)
This outputs:
Loaded: <Person firstname='John' lastname=None>
Chunked UL4ON
ll.ul4on
also provides access to the classes that implement UL4ON
encoding and decoding. This can be used to create multiple UL4ON dumps using the
same encoding context, or recreate multiple objects from those multiple UL4ON
dumps (using the same decoding context).
An example for encoding:
encoder = ul4on.Encoder()
obj = "spam"
print(encoder.dumps(obj))
print(encoder.dumps(obj))
This prints:
S'spam'
^0
The second call outputs a back reference, since the encoder remembers that the
string "spam"
has already been output.
An example for decoding:
decoder = ul4on.Decoder()
print(decoder.loads("S'spam'"))
print(decoder.loads("^0"))
This prints:
spam
spam
since the decoder remembers which object has been decoded as the first object from the first dump.
One application of this is embedding multiple related UL4ON dumps as data attributes in HTML and then deserializing those UL4ON chuncks back into the appropriate Javascript objects. For example:
from ll import ul4on
from ll.misc import xmlencode as xe
encoder = ul4on.Encoder()
counter = 0
def dump(obj):
global counter
counter += 1
return f"{counter} {encoder.dumps(obj)}"
data = ["gurk", "hurz", "hinz", "kunz"]
def f(s):
return f"<li data-ul4on='{xe(dump(s))}'>{xe(s.upper())}</li>"
items = "\n".join(f(s) for s in data)
html = f"<ul data-ul4on='{xe(dump(data))}'>\n{items}\n</ul>"
print(html)
This outputs:
<ul data-ul4on='5 L ^0 ^1 ^2 ^3 ]'>
<li data-ul4on='1 S'gurk''>GURK</li>
<li data-ul4on='2 S'hurz''>HURZ</li>
<li data-ul4on='3 S'hinz''>HINZ</li>
<li data-ul4on='4 S'kunz''>KUNZ</li>
</ul>
By iterating through the data-ul4on
attributes in the correct order and
feeding each UL4ON chunk to the same decoder, all objects can be recreated and
attached to their appropriate HTML elements.
Incremental UL4ON and persistent objects
Objects that have an attribute ul4onid
are considered “persistent” objects.
The combination of ul4onname
and ul4onid
uniquely identifies each
persistent object (even across multiple unrelated UL4ON dumps).
An Encoder
will dump those objects differently than other objects
without an ul4onid
attribute.
A Decoder
will remember all persistent objects it has loaded
(under their ul4onname
and ul4onid
). If the decoder encounters the
ul4onname
and ul4onid
of an object it has remembered, it will not create
a new object, instead ul4onload()
will be called for the existing object.
If the decoder encounters a persistent objects it hasn’t remembered, it will
create a new object (passing the ul4onid
as the only argument to the
constructor) and then call ul4onload()
on the new object.
This means that with this approach it’s possible to use one Decoder
object to load multiple unrelated UL4ON dumps “incrementally” one after the
other, but still merge the persistent objects in the subsequent dumps into the
those created by previous dumps.
Note
For persistent objects ul4onload()
and ul4ondump()
don’t have
the dump/load the ul4onid
attribute, as this is done by the
Encoder
/Decoder
.
Note
If the value of the attribute ul4onid
is None
the object will
be treated as an “ordinary” (i.e. non-persistent) object.
Module documentation
- ll.ul4on.register(name: str)[source]
This decorator can be used to register the decorated class with the
ll.ul4on
serialization machinery.name
must be a globally unique name for the class. To avoid name collisions Java’s class naming system should be used (i.e. an inverted domain name likecom.example.foo.bar
).name
will be stored in the class attributeul4onname
.
- class ll.ul4on.Encoder[source]
Bases:
object
An
Encoder
is used for serializing an object into an UL4ON dump.It manages the internal state required for handling backreferences and other stuff.
- class ll.ul4on.Decoder[source]
Bases:
object
A
Decoder
is used for deserializing an UL4ON dump.It manages the internal state required for handling backreferences, persistent objects and other stuff.
- __init__(registry: Dict[str, Callable[[...], Any]] | None = None)[source]
Create a decoder for deserializing objects from an UL4ON dump.
registry
is used as a “custom type registry”. It must map UL4ON type names to callables that create new empty instances of those types. Any type not found inregistry
will be looked up in the global registry (seeregister()
).
- load(stream: TextIO | None = None) Any [source]
Deserialize the next object from the stream
stream
and return it.stream
must provide aread()
method.Passing
None
forstream
may only be done by objects that callload()
to implement UL4ON deserialization in their ownul4onload()
method.
- loadcontent() Generator[Any, None, None] [source]
Load the content of an object until the “object terminator” is encountered.
This is a generator and might produce fewer or more items than expected. The caller must be able to handle both cases (e.g. by ignoring additional items or initializing missing items with a default value).
The iterator should always be exhausted when it is read, otherwise the stream will be in an undefined state.
- loadcontentitems() Generator[Tuple[str, Any], None, None] [source]
Similar to
loadcontent()
, but will load the content of an object as (key, value) pairs.For further info see
loadcontent()
.
- reset() None [source]
Clear the internal cache for backreferences so that a new unrelated UL4ON dump can be loaded.
However the cache for persistent objects will not be cleared.
- store_persistent_object(object) None [source]
Add a persistent object to the cache of persistent objects.
- forget_persistent_object(object) None [source]
Remove a persistent object from the cache of persistent objects.
- persistent_object(name: str, id: str) Any [source]
Return the persistent object with the type
name
and the idid
, orNone
, when the decoder hasn’t encountered that object yet.
- persistent_objects() ValuesView[Any] [source]
Return an iterator over all persistent objects the decoder has encountered so far.
- ll.ul4on.dumps(obj: Any, /, indent: str | None = None) str [source]
Serialize
obj
as an UL4ON formatted string.
- ll.ul4on.dump(obj: Any, /, stream: TextIO, indent: str | None = None) None [source]
Serialize
obj
as an UL4ON formatted stream tostream
.stream
must provide awrite()
method.
- ll.ul4on.load(stream: TextIO, /, registry: Dict[str, Callable[[...], Any]] | None = None) Any [source]
Deserialize
stream
(which must be file-like object with aread()
method containing an UL4ON formatted object) to a Python object.For the meaning of
registry
seeDecoder.__init__()
.
- ll.ul4on.loads(dump: str, /, registry: Dict[str, Callable[[...], Any]] | None = None) Any [source]
Deserialize
dump
(which must be a string containing an UL4ON formatted object) to a Python object.For the meaning of
registry
seeDecoder.__init__()
.
- ll.ul4on.loadclob(clob, /, bufsize: int = 1048576, registry: Dict[str, Callable[[...], Any]] | None = None) Any [source]
Deserialize
clob
(which must be ancx_Oracle
CLOB
variable containing an UL4ON formatted object) to a Python object.bufsize
specifies the chunk size for reading the underlyingCLOB
object.For the meaning of
registry
seeDecoder.__init__()
.