2017-03-25

Makron - template language and evaluator

Makron (swedish for macaron) is the programming language used for text-based code generation in the Biotope application server. It is a kind of template or string processing language. It is typically used for server side generation of HTML code, but it is also suitable for any text based target languages like Graphviz, JSON, CSS and Javascript. The current implementation of the interpreter is tightly coupled with the relational model described in A Distributed and Relational Data Model.

Execution starts by invoking a well known macro that is stored as a Code entity in the relational model and ends when the macro is completely evaluated. A Makron program has access to the invocation URL and its parameters, can iterate over the result from a query to the relational model and can download content from the net and use all of these sources when composing its text-based result. Conditional execution is driven by regex matching, which also handles the limited parsing that is possible. A future goal is to include a proper parser generator. A Makron program is typically responsible for creating the response to GET requests to a Biotope servlet.

A Makron program cannot directly modify the relational data model. Instead, it is the JS/HTML application generated by the Makron evaluator that can modify the model through the Logger and Archiver servlets in Biotope.

High Level Language Overview

  • Everything is UTF-8 encoded text.
  • A string of text can be bound to a symbol. A symbol can either represent data or macro code.
  • A string of text can be evaluated, which means that macro invocations that are found in the text are substituted with the result of the recursive evaluation of the macro (in a new local symbol scope). Macros are invoked using a language construct called a macaron, e.g. {macro_name.arg1.arg2.arg3}. The rest of the text being evaluated is left unchanged.
  • Macaron start and end chars and argument delimiter chars are dynamically exchangeable to easily adapt to the properties of the target language.
  • Macro arguments are optionally evaluated in the invocation context or in the macro context. The macro itself explicitly performs eager, lazy or no evaluation of its arguments.

Hello, world!

A trivial program with no macro invocations (macarons) will just output itself as the result.

Hello, world!

>>> Hello, world!

Before we get to more advanced evaluation rules we need some definitions.

Symbols

A symbol name can use the characters a-z, A-Z, 0-9, _ and -.

This_is_a_Symbol

Macarons

The default macaron start and end chars are '{' and '}'. It is a macaron that triggers macro invocation and all characters in the macaron are part of the invocation.

{This is a macaron}

It is possible to change the macaron chars dynamically in the source by putting a macaron framed with new macaron chars directly inside a macaron. Valid macaron chars are '(', ')', '{', '}', '[', ']', and '<', '>'.

{(Parenthesis are now used as macaron chars within this macaron.)}

The change is tied to the lexical scope and does not affect macros that are invoked from within the macaron.

Escaped Chars

The escape char '\' can be used to prevent the normal interpretation of characters with special meaning, for example if you want to type a macaron char but don't want to start or end a macaron. It can also be used to insert newline '\n' and tab '\t'. A backslash is written '\\'.

Delimiter Chars

A delimiter char is a character not part of any of the other categories above. It is used to separate macro name and the arguments in a macaron.

Now we have what we need to go on with macro evaluation.

Macro Evaluation

During evaluation, the macro source code is read character by character and they are typically passed on unprocessed to the output stream. Escaped characters are passed on to the output stream without the escape char '\'.

Plain \{text\}

>>> Plain {text}

Macro Invocation

When a macaron is encountered it is not passed on to the output stream. Instead, macro invocation is initiated. First, a macro name followed by a delimiter char is scanned for. The process is a form of evaluation and other macros can be invoked during the scan. When a delimiter char is found, evaluation stops and the rest of the unprocessed input stream up to the end of the macaron is treated as the macro argument. The source code of the new macro is loaded and evaluated in a new scope with the variables self, '.', body, start and end bound to new values.

{macro/with/{argument}/body}

Bindings in new evaluation scope:

self  = macro
.     = /
body  = with/{argument}/body
start = {
end   = } 

Function: = (let)

Makron is dynamically scoped with all the usual drawbacks, but don't dismiss this language just yet. For small programs implementing view generation, it is quite convenient to not have to specify all necessary parameters in every macro invocation. Symbols are semi-global, i.e. global in its local execution branch. The code structure with one macro per file also doesn't make static scope very useful.

A symbol always refer to a stream of UTF-8 encoded text. Symbols can be defined in the local scope using the built-in function let. It will evaluate both of its arguments before binding the first argument, the symbol name, to the second argument, the value.

{let.fn1.Hello, world!}

or

{=fn1=Hello, world!}

 The content of the symbol fn1 can then be evaluated, i.e. a macro invocation.

{fn1}

>>> Hello, world!

Macro: ' (quote)

We will now define a second symbol containing code that calls fn1.

{=fn2={'{fn1}}}

The argument to let is always evaluated, therefore we have to prevent the premature evaluation of {fn1} with the quote macro. We could also escape the bracket or temporarily changed the macaron chars with the same effect.

{let.fn2.\{fn1\}}
{let.fn2.{( {fn1})}} 

The result when evaluating fn2 is the same, as expected.

{fn2}

>>> Hello, world!

Function: $ (value)

If we instead want to get the content of the symbol fn2 without evaluating it, we can use the built-in function $.

{$$fn2}

>>> {fn1}

The reason for the double $$ is that $ takes two arguments and we have given an empty first argument. It is very common that strings need to be escaped in different ways when inserting them in program code. Therefore the $ function takes a first argument that specify what kind of escaping that needs to be applied before output. A number of escape schemes are defined, e.g. qoute, squote, html, url.

{$url$fn2}

>>> %7Bfn1%7D

Function: [space] (identity)

As we saw above we can use the space function as a function that just evaluates and outputs its argument. This is for example useful when we want to change macaron chars but don't want to invoke a macro as the first thing.

{( {fn1})}

>>> {fn1}

Functions: first and rest

If we break down the anatomy of a macro invocation we can see that it consists of a macro name followed by a delimiter char and the rest is the argument. Yes, always a single argument. The macro itself needs to break up the single argument into parts if it needs multiple arguments. To be able to do this it can access the delimiter char that was used in the invocation with the macro '.', i.e. {.}. The unevaluated argument itself is bound to the variable body.

There are two helper functions first and rest that are useful when splitting up body. first returns the part before the first delimiter char and rest returns the part after the first delimiter char.

Function: @ (arg)

{arg.a1.{first{.}{$$body}}}
{arg.a2.{first{.}{rest{.}{$$body}}}}
{@a3@{rest{.}{rest{.}{$$body}}}}


The function arg works like let with the difference that it evaluates the second argument twice, first in the local scope and the second time in the scope of the caller. This is the mechanism that turns the macro into a function with eager evaluation of its arguments.

By convention, the last argument is assigned to what is left without looking for the delimiter char, i.e. no call to first. This means that the last part of body can include the delimiter char as data without the need to escape it.

Normally, a function splits its argument at the delimiter chars and then evaluates the parts separately. This allows variables containing arbitrary strings to be passed as arguments. If the argument is first evaluated as a whole and then split at the delimiter, the arbitrary strings that are inserted into the argument may have introduced unwanted delimiter or macaron chars that will ruin the intended argument separation. first and rest are different though. They will first evaluate their argument and then look for the delimiter char.

Function: ~ (eval)

Sometimes it can be handy to be able to evaluate a string as a macro without first bind it to a symbol. The function eval will evaluate its argument and then evaluate the resulting string once more.

Function: ^ (upeval)

The upeval function is a mix of eval and arg. It will evaluate its argument and then evaluate the resulting string in the scope where the current macro was invoked.

Function: ? (query)

To get data from the relational model, the function query is used. The function evaluates the first argument and sends it as a sql query to the database. The second argument, when evaluated, is the code that will be evaluated for each row in the query result. The third argument, when evaluated, is the code that will be evaluated between each row. Brackets in the query will mark the symbol name to bind to each column in the result.

{query|SELECT s.n AS [num] FROM (VALUES(1),(2),(3),(4),(5)) s(n)|{'{$$num}}|, }

>>> 1, 2, 3, 4, 5

Function: % (regex)

Using regular expressions, it is possible to conditionally evaluate Makron code. When the regex matches a string, one code path is evaluated. If it doesn't match, another code path is evaluated. The whole input string must match, not just a part of it. The regex can also be used as a simple parser as regex groups can be bound to variables in Django style.

Pre-Defined Symbols

When the Makron evaluator is running in a servlet context we very much like to get the URL and its parameters to be able to respond accordingly. The following symbols are pre-defined in such a context.

contexturl - Contains the URL to the servlet engine.
rooturl - Contains the ULR to the servlet.
url - Contains the part of the URL after rooturl. Always starts with '/'.
p_xyz - Contains the value of the URL parameter xyz.

Symbols in the Relational Data Model

I mentioned earlier that the first function that was invoked would be retrieved from the relational data model. Symbols stored in the relational data model are available in the outermost scope. A symbol is represented by an entity in the relation code10c1. It is associated with a name through the chain code10c1-name11c11-text_value. The contents is stored in a web archive with an URL found through the chain code10c1-data11cn1-location02cn1-text_value.  

Undefined Functions

Macarons that invoke an undefined macro are ignored and can be used as source code comments. 

Arithmetic Operations

Ongoing... 

Input Qualification

Ongoing...

2017-01-01

Laser Trap: Dexterous Escape

Laser Trap: Dexterous Escape is a hobby project I have been working on for some time. In this post I will go though the things you will need to build your own laser course at home. The goal is to move as fast as possible through a room filled with laser beams without touching any of the beams.


To learn more about how the hardware is constructed, read Designing a Laser Trap. If you want to know how to put a trap together, see Assembling a Laser Trap.

Equipment

Laser Trap, Battery, Mirrors, Smoke, Android phone/tab, The app Dexterous Escape.

Troubleshooting

The laser and the led turns off and on. Contact with Android app is lost.

The trap is responding slowly on the commands from the Android app.

Assembling a Laser Trap

This post shows how to assemble a laser trap from parts. If you want to know how the trap is designed, read Designing a Laser Trap. If you want to know how it can be used, check out Laser Trap: Dexterous Escape.


The following tools are useful to have during the assembly of a laser trap. Soldering iron, tin solder, flux, tweezers, nippers, toothpick (for the flux), microscope, Blu-Tack, permanent marker and a FTDI USB TTL serial cable.

You need some experience at soldering to be able to put together a laser trap. Some of the components are quite small and can be difficult to handle for a beginner.

Main Board

The following parts are needed when assembling the main board for a laser trap.

1x Main PCB
1x 75Ohm resistor, 0.1W, 0603 package (R1)
1x 1uF ceramic capacitor, 0603 package (C1)
1x 1.8V LED, 0603 package (D1)
1x HM-10 Bluetooth LE Module
1x LDO regulator 3.3V, SOT-223 package (U1)
2x 1x3p female connector
1x 9V battery connector
1x Steel stripe



Start by applying flux to the pads for R1, C1 and D1. This will make the components stick a little and you avoid having to chase the components around the board with the soldering iron. Place the components on the pads and solder them in place. Note that D1 have polarity and should be placed with the cathode pointing towards R1. The polarity is marked on the back with a T. The top side of the T is the cathode. (This is a bit strange as all guides on the net say the opposite, but the diode I am using, HSMH-C190, has its marking reversed.)


Put a small amount of Blu-Tack on the back of HM-10 and put it in place on the Main PCB and solder it to the seven pads. If you have Main PCB R1 (without revision marking) you only need to solder one of the four ground pads on the lower side. If you have a HM-10 module without USB as the one in the image, you don't even have pads for two of the ground connections.


Put U1 in position and solder the three legs and the heat sink in place.


Insert the leads from the 9V battery connector into the holes below U1 and solder them in place. Black is ground and should be soldered to the hole out in the corner of the PCB.

If you have Main PCB R1 you need to cut the copper trace that connects the two outer pads in the lower 3p connector. You should then connect the lower pad to one of the spare ground pads next to HM-10.


Put both 1x3p female connectors in position and solder them in place.

Mark the top side of the board as shown in the image.


Connect a 9V battery or another power source between 4.8V and 15V. The LED should start to blink. Attach a USB TTL serial cable to the serial terminals and exchange the following request response sequence at 9600-8N1.
   AT+MODE1
   OK+Set:1
   AT+NAMELASERTRAP
   OK+Set:LASERTRAP

The trap is now ready to connect to the Android application.


For lasers that draw more than 30mA the voltage regulator needs to be cooled more efficiently than what the PCB alone can manage. Attach the steel stripe with Blu-Tack on the opposite side of U1. This stripe is also useful for mounting the trap and sensor in the correct position to monitor the laser beam.


Sensor Board

The following parts are needed when assembling the sensor board for a laser trap.

1x Sensor PCB
1x 1x3p male connector, 90 degree angle
1x 1.3kOhm resistor, 0.1W, 0603 package (R1)
2x 1x1p male connector, straight
7x Photo resistor


Start by applying flux to the pads for R1. Place the component on the pads and solder it in place.


Insert the two photo resistors at the top and the two photo resistors at the bottom into the mounting holes. Cut the leads and solder them in place.


Remove the pins from the two 1x1p male connector and put the remaining plastic tubes on the outer legs of the right and left photo resistors as support. Insert the three center row photo resistors into the mounting holes. Cut the leads and solder them in place.

Put the 1x3p male connector in position and solder it in place.


Red Laser

The following parts are needed when assembling the red laser for a laser trap.

1x 1x3p male connector, straight
1x Red Laser Module
2x Shrink tubes
1x Steel stripe
1x Copper bracelet
1x M5 bolt
1x M5 nut

Green Laser

The following parts are needed when assembling the green laser for a laser trap.

1x Driver PCB
1x 1x3p male connector, 90 degree angle
1x 200Ohm resistor, 0.1W, 0603 package (R1)
1x BC807 PNP transistor, SOT-23 package (Q1)
1x Green Laser Module
1x Steel stripe
1x Copper bracelet
1x M5 bolt
1x M5 nut

Laser Trap

The following parts are needed when assembling a laser trap.

1x Main Board
1x Sensor Board
1x Red or Green Laser

Designing a Laser Trap

The game Laser Trap: Dexterous Escape uses dedicated hardware modules called traps for its implementation. Each trap has a laser module, which can be turned on and off, and a light sensor which can measure the amount of light that illuminates its surface. The game is supervised by an Android application which uses Bluetooth LE to communicate with the traps.

Central Processing Unit

When I started designing the trap I didn't know which parts I would use. I wanted to use wireless communication as I imagined the laser module and the sensor to be located in opposite parts of the room. Wifi and Bluetooth LE were two technologies I considered. I knew wifi modules used a lot of power so I began to search the net for BLE modules with GPIO capabilities. HM-10 [1] turned up frequently. A pair of HM-10 modules can connect to form a bidirectional serial data link, but we are only interested in the AT commands. When I read the specification I figured I could use the AT+PIO command to control the laser and read the state of the light sensor, although a GPIO with interrupts would have been better than constant polling. Later I found the AT+ADC command which allows the application to read the exact sensor level between 0V and 3.3V instead of just a digital 0 or 1 with a switch of value somewhere between 0.8V and 2.0V. It is handy to be able to adjust the detection level in software to be able to adapt to different light conditions. HM-10 can be set to AT+MODE1, where it can receive AT commands over the air, and not only from the wired serial port which is the default. To distinguish traps from other BLE modules nearby I decided to use the command AT+NAME and call them LASERTRAP.

Power Supply

The HM-10 runs on 3.3V. It is practical to have a voltage regulator supply this voltage and not a battery directly, as it makes it possible for the trap to accept a wider range of voltages (4.8V-15V) from various sources. A low current voltage regulator is cheaper, but I also want it to be able to supply power to the laser module. I decided to use a 800mA regulator [2].

Status LED

HM-10 will use a Light Emitting Diode (LED) D1 [3] connected to PIO1 as a status indicator. It will blink when it is waiting for a Bluetooth connection to be established. When connected it will be steady on. A LED have polarity and the anode should be connected to PIO1. The cathode should be connected to ground through a resistor R1 that will limit the current that flows through the LED. With PIO1 delivering 3.3V and a LED rated at 1.8V, 20mA, gives us a R1 = (3.3V-1.8V)/0.020A = 75Ohm.

The Laser Module

As the HM-10 runs on 3.3V it is practical to choose a laser module which can run on 3V, as it means they can share a single voltage regulator. Powerful lasers can damage the eye therefore I decided to use lasers with 1mW power or less. I selected PIO2, one of the available output pins on the HM-10, to control the laser. I decided PIO2 should act as ground for the laser and sending AT+PIO20 will turn on the laser and sending AT+PIO21 will turn off the laser. The laser module is connected to 3.3V with the positive part and the negative part is connected to PIO2.

Now I had to think about how much current PIO2 can handle before it starts to raise significantly above ground, say 0.3V. This is caused by the inner resistance Ri between the pin and the true ground. With a measured Ri = 10Ohm, PIO2 can handle 0.3V/10Ohm = 30mA. This is the maximum current the laser module is allowed to draw if it needs 3V to operate.

For more power hungry laser modules it is necessary to add a driver stage between PIO2 and the laser. Using a PNP transistor with an emitter to collector current large enough to drive the laser (BC327, BC807 [4]) we can still use PIO2 to turn the more power hungry laser on and off. The emitter is connected to 3.3V. To turn on the flow of current from emitter to collector which will drive the laser, the transistor base needs to be grounded through PIO2. An emitter to base current of 10mA is enough for the transistor to turn on the emitter-collector current. As the emitter-base pair is a diode with 1.2V voltage drop, it is necessary to add a current limiting resistor R1 between the base and PIO2. R1 = (3.3V-1.2V)/0.010A = 200Ohm. The voltage drop over the emitter-collector pair is typically between 0.05V and 0.3V, which leaves 3V for the laser. The laser module is connected to the collector with the positive part and the negative part is connected to ground.

The most affordable laser module is 650nm red. Unfortunately, our eyes are only about 20% efficient at this wavelength. This means the laser beam can not be seen from all angles in a dark room with smoke or fog in the air. For best visibility the more expensive 532nm green can be used. It is visible from all angles. Somewhere in between, both for visibility and price, is the 635nm red.

The Light Sensor

To detect that the uninterrupted laser beam reaches its destination a photo resistor (GL5528 [5]) can be used. The resistance of a photo resistor will drop when the light intensity at the surface increases. Together with a fixed resistor in series it forms a voltage divider for the 3.3V reference voltage. By measuring the voltage over the photo resistor it is possible to determine how much light it is detecting. I soon found out that it is quite difficult to aim a laser to point exactly at a 5mm photo resistor. I decided to use seven photo resistors in parallel to increase the target area where the laser beam can be detected. A 1mW laser gives us a resistance of 400Ohm over the photo resistor. I want the sensor to be compatible with a digital zero at 0.8V. With the same current flowing through both R1 and the photo resistor, we have the equivalence 0.8V/400Ohm = (3.3V-0,8V)/R1, which can be solved as R1 = 400Ohm*2.5V/0.8V = 1250Ohm, which is the lowest value of R1 that is acceptable to properly detect a 1mW laser.

The typical use case for the sensor designed above is a dark room. Normal daylight that reaches the sensor will make the resistance drop below 400Ohm and therefore make the sensor unusable under such conditions. We can make the sensor less sensitive by raising the laser detection level to 2.5V in a dark room. We will by this modification extend our headroom to also be able to work in a bright room, but at a detection level somewhere below 2.5V, as the combined light that hits the sensor will be more intense than the laser alone. The Android application will have to dynamically adjust to the correct laser detection level to be able to handle varying ambient light. R1 = 400Ohm*0.8V/2.5V = 128Ohm.

References

[1] http://www.jnhuamao.cn/Bluetooth40_en.zip
[2] http://www.diodes.com/_files/datasheets/AZ1117C.pdf
[3] http://www.avagotech.com/docs/AV02-0551EN
[4] http://cache.nxp.com/documents/data_sheet/BC807_BC807W_BC327.pdf?pspll=1
[5] http://akizukidenshi.com/download/ds/senba/GL55%20Series%20Photoresistor.pdf

2015-02-15

When is Always an Interval

A record of when an event took place is always a time interval, never a single moment in time. For the quickest events, it is the resolution of your time measuring equipment that sets the length of the interval. For photos it is the shutter speed. For video recordings it is the start and end time. When you talk about the year you were 12, the interval is a full year. The only time you can get away with only recording one moment in time, is when all events share the same duration.

2015-02-09

A Quick Comment on Object Oriented Programming

Ok, here is a quick comment on Object Oriented Programming, to finally put to rest the complicated ways it has been taught, 20 years too late and 7 years after I first thought I should make it.

A printing press is not built from books. That would just look silly and not work very well. It consists of specialized parts that are used to operate on books. Then why are programmers that build virtual printing presses insisting on adding a Book class to the machine? The things the machine operate on has nothing to do with the parts it is assembled from. And this is equally true for both physical and virtual machines.

2015-02-05

A Distributed and Extensible Relational Data Model

In this post, I will describe the design of a relational data model
  • that can be distributed (available, partition tolerant, eventual consistent).
  • that can be dynamically extended by the applications operating on it.
  • where the cardinality of a relation is decided at query time.
  • where relations are dynamically typed and queries are statically typed.
  • where all past values of relations are kept in the data model, to make it possible to revisit them at a later time.
  • where hashes are used to reference values, which make them location independent and makes it possible to store values redundant.
  • that is built on simple principles and easy to implement using conventional technology.

Relations

In a relational data model, it is the associations between the entities (objects, things) that we are interested in. They are collected in relations. This representation is complementary to the object-oriented (hierarchical, document) data model, where objects (records, documents) contain references to other objects. One advantage of the relational data model is that it doesn't impose any particular access pattern, as relations can be navigated in any direction [1].




The entities in the relational data model we are about to construct are represented by unique identifiers, which are used to track the entities across multiple relations. An entity is defined by the relations it is part of. It has no other attributes or properties in addition to this.

To illustrate this, imagine that we want to keep track of the persons which appears in the photos in our digital photo collection. For this we need the relation appears_in20.

CREATE TABLE appears_in20 (a uuid, b uuid);

The relation is a binary relation with 2 entities and 0 values, indicated by the digits '20' at the end of the name. As we can see, there is nothing in the relation that mentions persons or photos. Any type of entities can be related in this relation. This could cause us trouble further on when constructing queries that should only give us persons and photos. Therefore, we will also create unary identity relations for each type of entity.

CREATE TABLE person10 (a uuid);
CREATE TABLE photo10 (a uuid);



So far, we have shown how to relate entities to form a graph. To go beyond the pure graph and fill the model with some content, we need to be able to relate entities with values (attributes) as well. A value is a sequence of bytes which is given meaning through its type. A sequence of bytes can be hashed to give us a uniform reference to any value [2].

We go back to the persons and photos, and decide that a person should have a name encoded in utf8 (text/plain) and that a photo should be associated with a jpeg file (image/jpeg). A SHA256 hash in hex format is 64 characters long.

CREATE TABLE name11 (a uuid, h char(64));
CREATE TABLE image11 (a uuid, h char(64));

CREATE TABLE text_plain01 (h char(64));
CREATE TABLE image_jpeg01 (h char(64));

Still, we have no actual values in the model, just references to them. This is solved by adding a relation between the hash and the value.

CREATE TABLE text_value (h char(64), v text);
CREATE TABLE binary_value (h char(64), v bytea);

There will be one value relation for each native database type we want to use.

Large binary data may not be suitable to store in the database. It can also be more practical to fetch an image or document from a simple web server instead of the database. It depends on the system architecture. For these values, we introduce a relation to associate the value with an URL. The URL is itself a value with type text/plain.

CREATE TABLE location02 (h char(64), i char(64));

The mapping of mime types to database types and whether values of a certain type are stored in the database or not is knowledge that needs to be shared among the various parts in the system that interacts with the relational model.

The arrangement to use hashes as references to values allow us to store values redundant, duplicated at several locations. Typically, the location02 relation is populated by crawlers that find and track values stored in archives, on the web or local.

A legitimate concern with the proposed data model is the many joins required to put together a record with all values and associations for an entity. The overall performance of the data model may not be as good as a more complex relational model, but we are willing to trade performance for simplicity in this case. The goal is to allow the data model to grow dynamically, using just a few core concepts. To simplify the construction of queries, the plan is to design a specialized query language that can take advantage of the symmetries in the data model.

We observe that the relation appears_in20 has the cardinality many-to-many. How other cardinalities should be represented will be addressed later. First, we will take a step back and see how we can create and update relations.

A Distributed Event Log

Using the log at LinkedIn [3][4] as a source of inspiration, we will use a log of events as the original source of information flowing into the database. The log is kept in a directory in the filesystem. A log entry is one file, containing a JSON array of objects. Each event can either associate or dissociate entities and values using named relations.

[
  {
    "a": "daa8665c-df7c-4f09-959f-27f71eb65884",
    "r": "person10",
    "o": "associate",
    "t": "2015-01-27T15:28:50.445170+00:00"
  },
  {
    "h": "245536acb08b0958f1aa78d391d5efc773e60fa2c39d942318aa04d6085bef40",
    "v": "Marcus",
    "r": "text_value"
  },
  {
   "h": "245536acb08b0958f1aa78d391d5efc773e60fa2c39d942318aa04d6085bef40",
    "r": "text_plain01",
    "o": "associate",
    "t": "2015-01-27T15:28:50.445170+00:00"
  },
  {
    "a": "daa8665c-df7c-4f09-959f-27f71eb65884",
    "h": "245536acb08b0958f1aa78d391d5efc773e60fa2c39d942318aa04d6085bef40",
    "r": "name11",
    "o": "associate",
    "t": "2015-01-27T15:28:50.445170+00:00"
  },
  {
    "a": "0b7a6027-756d-4f5e-ae1d-e36c593b5e95",
    "r": "photo10",
    "o": "associate",
    "t": "2015-01-27T15:28:50.445170+00:00"
  },
  {
    "a": "daa8665c-df7c-4f09-959f-27f71eb65884",
    "b": "0b7a6027-756d-4f5e-ae1d-e36c593b5e95",
    "r": "appears_in20",
    "o": "associate",
    "t": "2015-01-27T15:28:50.445170+00:00"
  }
]

With a modest amount of mental effort, it is possible to map this notation to the relations defined earlier. JSON attributes 'a' - 'g' are entities and 'h' - 'n' are values. The first log entry that introduces a new relation will effectively create the corresponding table, indexes and views, allowing the schema to grow dynamically. The JSON attribute 'r' names the relation in question. The JSON attribute 'o' names the operation to perform on the relation, where 'associate' adds the association to the relation. The counteraction is 'dissociate', which removes the association from the relation. The log entry is fixed in time with the JSON attribute 't'. (We can also see that values are logged using a simpler relation without time and operation. The hash of a value simply exists as an immutable fact.) All of these attributes come together to create an append-only relation in the database.

CREATE TABLE appears_in20all (a uuid, b uuid, o text, t timestamp);

From this, we derive the appears_in20 relation. The plan is to include a pair (a', b') only if the number of rows with o='associate' for the pair is more common than the number of rows with o='dissociate'. Expressed alternatively, there must be more log entries that want to associate a' and b', than there are log entries that want to dissociate the two.

CREATE VIEW appears_in20 AS (
   SELECT a, b FROM (
      SELECT DISTINCT ON (a, b) a, b, o FROM appears_in20all
         GROUP BY o, a, b
         ORDER BY a, b, COUNT(o) DESC, o DESC) AS c
   WHERE o='associate');

The query groups the related entity pair with the operation in the inner SELECT and counts how many times each combination occurs. It then sorts them in descending order to get the most frequently occurring combination at the top. When the number of 'associate' and 'dissociate' operations are equal for a pair, 'dissociate' will end up on the top by a descending sort. It then picks the top row for each pair with DISTINCT ON (a, b). As a final touch, the outer SELECT will only keep the rows where the operation is 'associate'.
 
The reason for this arrangement is to let the database be distributable while trying to reach eventual consistency. We want to distribute the database to get data redundancy rather than load balancing. By synchronizing the event log directory between the participating nodes on the file level [5][6], changes from different nodes are merged. Just make sure the file names of the log entries are unique, for example a hash of the contents. I know I am out on thin ice here, but I believe this corresponds to an Available and Partition tolerant system in the context of the CAP theorem. The distributed system as a whole does not guarantee Consistency, even though it will eventually (sometimes) be consistent. The limitation of this arrangement is that the nodes must be cooperating and can't compete for limited resources. You need distributed transactions or transactions on a common write node to be able to do that.

Another benefit is the ability to go back in time, by creating a view like appears_in20 with a time constraint. Yet another benefit is the possibility for an application to read the entire history of how a relation has evolved to become what it is today, and make it possible to undo changes.

The data doesn't have to be merged into a single log. It is possible to partition the log locally and let the data be merged in the database. You can partition on where the data comes from, who is the author, what domain it belongs to or any other reason. This makes it easy to incorporate, discard and recombine information from different sources in the local database.

Cardinality

So far we have defined a many-to-many relation. We would also like to define one-to-many and one-to-one relations. This can be done using the recorded timestamp to only select the most recent association. We will begin with a redefinition of the many-to-many relation.

CREATE VIEW appears_in20cnn AS (
   SELECT a, b, t FROM (
      SELECT DISTINCT ON (a, b) a, b, o, max(t) as t FROM appears_in20all
         GROUP BY o, a, b
         ORDER BY a, b, COUNT(o) DESC, o DESC) AS c
   WHERE o='associate');

CREATE VIEW appears_in20c1n AS (
   SELECT DISTINCT ON (b) a, b, t FROM appears_in20cnn
   ORDER BY a, t DESC);

CREATE VIEW appears_in20cn1 AS (
   SELECT DISTINCT ON (a) a, b, t FROM appears_in20cnn
   ORDER BY b, t DESC);

CREATE VIEW appears_in20c11 AS (
   SELECT a, b, t from appears_in20c1n
   INTERSECT
   SELECT a, b, t from appears_in20cn1);

Worth mentioning here is if a relation is used in different contexts, for example the relation portrait_of20 which relates both families and photos, and persons and photos, the relations with lower cardinality are type agnostic, meaning that a photo can only be associated one-to-one with either a person or a family.

Another observation is that for one-to-many relations, when we use the 'associate' operation to overwrite the side with cardinality one, the previous association will be pushed away into a stack of old associations. Later, when the current association is 'dissociated', the association on top of the stack will reappear. An example would be a person associated with the name 'Per' and later corrected with a new association with the name 'Pär'. When 'Pär' is dissociated, 'Per' will reappear as the associated name. To avoid this, the application will have to dissociate the old name when the name changes. This procedure also makes the many-to-many relation identical to the one-to-many relation.

To undo the effects of a log entry, just change 'associate' to 'dissociate' and vice versa to form the inverse log entry.

Case Study: the Family Album

Below is the auto generated schema for a relational model representing a family album.


Wrapping Up

We have designed a distributed relational data model, dynamically extensible by users through a replicated event log. Relations associate entities, identified by UUID, with other entities and/or values, identified by their hash. Values are stored in remote or local archives and are accessed using URLs. Values can also be stored in the event log and be cached in the data model. Users can access historical values of a relation, alongside many-to-many, one-to-many and one-to-one interpretations of the current value of the relation.



The architecture supports a number of variations:
  • Log Reader and File Crawler can work over the web.
  • Log synchronization must not be bidirectional.
  • Data sets from different domains can easily be merged.
  • It is always possible to drop and recreate the relational database from the Event Log. This may take a long time though, if there are many events.
And going into unknown territory, we might ponder:
  • How do we merge different uuids that represent the same thing? (The aka20 relation!)
  • How far will it scale? Are there any bottlenecks?
  • What would a domain specific query language look like? Compiled to SQL.
  • Can the Web Application itself be stored in the Archive? Can it be modified in the GUI? Can it compile replacement parts of itself from source at runtime? Integrated make. Can the compiler and the build system also be stored in the archive and be modifiable? Can we undo if the ability to change the application breaks due to changes to its development environment? What if undo also breaks? Which interpreter do we target?

References

[1] Out of the Tar Pit, Ben Moseley and Peter Marks
A text in line with my own thoughts on how to design information systems. In particular, I found section 7 particularly familiar, where they describe how to turn informal requirements into formal requirements, which are then executed. I have written about this earlier here and here.

[2] The Mess we're in, Joe Armstrong
36 minutes into the video, Joe introduces the idea to use the hashes as references to values.

[3] The Log: What every software engineer should know about real-time data's unifying abtraction, Jay Kreps
Inspiring article about the usefulness of the log.

[4] Turning the Databse Inside-out with Apache Samsa, Martin Kleppman
More thoughts around the log and materialized views.

[5] http://syncthing.net/
File replication software.

[6] http://www.getsync.com/
File replication software.