2015-02-15

When is Always an Interval

A record of when an event took place is always a time interval, never a single moment in time. For the quickest events, it is the resolution of your time measuring equipment that sets the length of the interval. For photos it is the shutter speed. For video recordings it is the start and end time. When you talk about the year you were 12, the interval is a full year. The only time you can get away with only recording one moment in time, is when all events share the same duration.

2015-02-09

A Quick Comment on Object Oriented Programming

Ok, here is a quick comment on Object Oriented Programming, to finally put to rest the complicated ways it has been taught, 20 years too late and 7 years after I first thought I should make it.

A printing press is not built from books. That would just look silly and not work very well. It consists of specialized parts that are used to operate on books. Then why are programmers that build virtual printing presses insisting on adding a Book class to the machine? The things the machine operate on has nothing to do with the parts it is assembled from. And this is equally true for both physical and virtual machines.

2015-02-05

A Distributed and Extensible Relational Data Model

In this post, I will describe the design of a relational data model
  • that can be distributed (available, partition tolerant, eventual consistent).
  • that can be dynamically extended by the applications operating on it.
  • where the cardinality of a relation is decided at query time.
  • where relations are dynamically typed and queries are statically typed.
  • where all past values of relations are kept in the data model, to make it possible to revisit them at a later time.
  • where hashes are used to reference values, which make them location independent and makes it possible to store values redundant.
  • that is built on simple principles and easy to implement using conventional technology.

Relations

In a relational data model, it is the associations between the entities (objects, things) that we are interested in. They are collected in relations. This representation is complementary to the object-oriented (hierarchical, document) data model, where objects (records, documents) contain references to other objects. One advantage of the relational data model is that it doesn't impose any particular access pattern, as relations can be navigated in any direction [1].




The entities in the relational data model we are about to construct are represented by unique identifiers, which are used to track the entities across multiple relations. An entity is defined by the relations it is part of. It has no other attributes or properties in addition to this.

To illustrate this, imagine that we want to keep track of the persons which appears in the photos in our digital photo collection. For this we need the relation appears_in20.

CREATE TABLE appears_in20 (a uuid, b uuid);

The relation is a binary relation with 2 entities and 0 values, indicated by the digits '20' at the end of the name. As we can see, there is nothing in the relation that mentions persons or photos. Any type of entities can be related in this relation. This could cause us trouble further on when constructing queries that should only give us persons and photos. Therefore, we will also create unary identity relations for each type of entity.

CREATE TABLE person10 (a uuid);
CREATE TABLE photo10 (a uuid);



So far, we have shown how to relate entities to form a graph. To go beyond the pure graph and fill the model with some content, we need to be able to relate entities with values (attributes) as well. A value is a sequence of bytes which is given meaning through its type. A sequence of bytes can be hashed to give us a uniform reference to any value [2].

We go back to the persons and photos, and decide that a person should have a name encoded in utf8 (text/plain) and that a photo should be associated with a jpeg file (image/jpeg). A SHA256 hash in hex format is 64 characters long.

CREATE TABLE name11 (a uuid, h char(64));
CREATE TABLE image11 (a uuid, h char(64));

CREATE TABLE text_plain01 (h char(64));
CREATE TABLE image_jpeg01 (h char(64));

Still, we have no actual values in the model, just references to them. This is solved by adding a relation between the hash and the value.

CREATE TABLE text_value (h char(64), v text);
CREATE TABLE binary_value (h char(64), v bytea);

There will be one value relation for each native database type we want to use.

Large binary data may not be suitable to store in the database. It can also be more practical to fetch an image or document from a simple web server instead of the database. It depends on the system architecture. For these values, we introduce a relation to associate the value with an URL. The URL is itself a value with type text/plain.

CREATE TABLE location02 (h char(64), i char(64));

The mapping of mime types to database types and whether values of a certain type are stored in the database or not is knowledge that needs to be shared among the various parts in the system that interacts with the relational model.

The arrangement to use hashes as references to values allow us to store values redundant, duplicated at several locations. Typically, the location02 relation is populated by crawlers that find and track values stored in archives, on the web or local.

A legitimate concern with the proposed data model is the many joins required to put together a record with all values and associations for an entity. The overall performance of the data model may not be as good as a more complex relational model, but we are willing to trade performance for simplicity in this case. The goal is to allow the data model to grow dynamically, using just a few core concepts. To simplify the construction of queries, the plan is to design a specialized query language that can take advantage of the symmetries in the data model.

We observe that the relation appears_in20 has the cardinality many-to-many. How other cardinalities should be represented will be addressed later. First, we will take a step back and see how we can create and update relations.

A Distributed Event Log

Using the log at LinkedIn [3][4] as a source of inspiration, we will use a log of events as the original source of information flowing into the database. The log is kept in a directory in the filesystem. A log entry is one file, containing a JSON array of objects. Each event can either associate or dissociate entities and values using named relations.

[
  {
    "a": "daa8665c-df7c-4f09-959f-27f71eb65884",
    "r": "person10",
    "o": "associate",
    "t": "2015-01-27T15:28:50.445170+00:00"
  },
  {
    "h": "245536acb08b0958f1aa78d391d5efc773e60fa2c39d942318aa04d6085bef40",
    "v": "Marcus",
    "r": "text_value"
  },
  {
   "h": "245536acb08b0958f1aa78d391d5efc773e60fa2c39d942318aa04d6085bef40",
    "r": "text_plain01",
    "o": "associate",
    "t": "2015-01-27T15:28:50.445170+00:00"
  },
  {
    "a": "daa8665c-df7c-4f09-959f-27f71eb65884",
    "h": "245536acb08b0958f1aa78d391d5efc773e60fa2c39d942318aa04d6085bef40",
    "r": "name11",
    "o": "associate",
    "t": "2015-01-27T15:28:50.445170+00:00"
  },
  {
    "a": "0b7a6027-756d-4f5e-ae1d-e36c593b5e95",
    "r": "photo10",
    "o": "associate",
    "t": "2015-01-27T15:28:50.445170+00:00"
  },
  {
    "a": "daa8665c-df7c-4f09-959f-27f71eb65884",
    "b": "0b7a6027-756d-4f5e-ae1d-e36c593b5e95",
    "r": "appears_in20",
    "o": "associate",
    "t": "2015-01-27T15:28:50.445170+00:00"
  }
]

With a modest amount of mental effort, it is possible to map this notation to the relations defined earlier. JSON attributes 'a' - 'g' are entities and 'h' - 'n' are values. The first log entry that introduces a new relation will effectively create the corresponding table, indexes and views, allowing the schema to grow dynamically. The JSON attribute 'r' names the relation in question. The JSON attribute 'o' names the operation to perform on the relation, where 'associate' adds the association to the relation. The counteraction is 'dissociate', which removes the association from the relation. The log entry is fixed in time with the JSON attribute 't'. (We can also see that values are logged using a simpler relation without time and operation. The hash of a value simply exists as an immutable fact.) All of these attributes come together to create an append-only relation in the database.

CREATE TABLE appears_in20all (a uuid, b uuid, o text, t timestamp);

From this, we derive the appears_in20 relation. The plan is to include a pair (a', b') only if the number of rows with o='associate' for the pair is more common than the number of rows with o='dissociate'. Expressed alternatively, there must be more log entries that want to associate a' and b', than there are log entries that want to dissociate the two.

CREATE VIEW appears_in20 AS (
   SELECT a, b FROM (
      SELECT DISTINCT ON (a, b) a, b, o FROM appears_in20all
         GROUP BY o, a, b
         ORDER BY a, b, COUNT(o) DESC, o DESC) AS c
   WHERE o='associate');

The query groups the related entity pair with the operation in the inner SELECT and counts how many times each combination occurs. It then sorts them in descending order to get the most frequently occurring combination at the top. When the number of 'associate' and 'dissociate' operations are equal for a pair, 'dissociate' will end up on the top by a descending sort. It then picks the top row for each pair with DISTINCT ON (a, b). As a final touch, the outer SELECT will only keep the rows where the operation is 'associate'.
 
The reason for this arrangement is to let the database be distributable while trying to reach eventual consistency. We want to distribute the database to get data redundancy rather than load balancing. By synchronizing the event log directory between the participating nodes on the file level [5][6], changes from different nodes are merged. Just make sure the file names of the log entries are unique, for example a hash of the contents. I know I am out on thin ice here, but I believe this corresponds to an Available and Partition tolerant system in the context of the CAP theorem. The distributed system as a whole does not guarantee Consistency, even though it will eventually (sometimes) be consistent.

Another benefit is the ability to go back in time, by creating a view like appears_in20 with a time constraint. Yet another benefit is the possibility for an application to read the entire history of how a relation has evolved to become what it is today, and make it possible to undo changes.

The data doesn't have to be merged into a single log. It is possible to partition the log locally and let the data be merged in the database. You can partition on where the data comes from, who is the author, what domain it belongs to or any other reason. This makes it easy to incorporate, discard and recombine information from different sources in the local database.

Cardinality

So far we have defined a many-to-many relation. We would also like to define one-to-many and one-to-one relations. This can be done using the recorded timestamp to only select the most recent association. We will begin with a redefinition of the many-to-many relation.

CREATE VIEW appears_in20cnn AS (
   SELECT a, b, t FROM (
      SELECT DISTINCT ON (a, b) a, b, o, max(t) as t FROM appears_in20all
         GROUP BY o, a, b
         ORDER BY a, b, COUNT(o) DESC, o DESC) AS c
   WHERE o='associate');

CREATE VIEW appears_in20c1n AS (
   SELECT DISTINCT ON (b) a, b, t FROM appears_in20cnn
   ORDER BY a, t DESC);

CREATE VIEW appears_in20cn1 AS (
   SELECT DISTINCT ON (a) a, b, t FROM appears_in20cnn
   ORDER BY b, t DESC);

CREATE VIEW appears_in20c11 AS (
   SELECT a, b, t from appears_in20c1n
   INTERSECT
   SELECT a, b, t from appears_in20cn1);

Worth mentioning here is if a relation is used in different contexts, for example the relation portrait_of20 which relates both families and photos, and persons and photos, the relations with lower cardinality are type agnostic, meaning that a photo can only be associated one-to-one with either a person or a family.

Another observation is that for one-to-many relations, when we use the 'associate' operation to overwrite the side with cardinality one, the previous association will be pushed away into a stack of old associations. Later, when the current association is 'dissociated', the association on top of the stack will reappear. An example would be a person associated with the name 'Per' and later corrected with a new association with the name 'Pär'. When 'Pär' is dissociated, 'Per' will reappear as the associated name. To avoid this, the application will have to dissociate the old name when the name changes. This procedure also makes the many-to-many relation identical to the one-to-many relation.

To undo the effects of a log entry, just change 'associate' to 'dissociate' and vice versa to form the inverse log entry.

Case Study: the Family Album

Below is the auto generated schema for a relational model representing a family album.


Wrapping Up

We have designed a distributed relational data model, dynamically extensible by users through a replicated event log. Relations associate entities, identified by UUID, with other entities and/or values, identified by their hash. Values are stored in remote or local archives and are accessed using URLs. Values can also be stored in the event log and be cached in the data model. Users can access historical values of a relation, alongside many-to-many, one-to-many and one-to-one interpretations of the current value of the relation.



The architecture supports a number of variations:
  • Log Reader and File Crawler can work over the web.
  • Log synchronization must not be bidirectional.
  • Data sets from different domains can easily be merged.
  • It is always possible to drop and recreate the relational database from the Event Log. This may take a long time though, if there are many events.
And going into unknown territory, we might ponder:
  • How do we merge different uuids that represent the same thing? (The aka20 relation!)
  • How far will it scale? Are there any bottlenecks?
  • What would a domain specific query language look like? Compiled to SQL.
  • Can the Web Application itself be stored in the Archive? Can it be modified in the GUI? Can it compile replacement parts of itself from source at runtime? Integrated make. Can the compiler and the build system also be stored in the archive and be modifiable? Can we undo if the ability to change the application breaks due to changes to its development environment? What if undo also breaks? Which interpreter do we target?

References

[1] Out of the Tar Pit, Ben Moseley and Peter Marks
A text in line with my own thoughts on how to design information systems. In particular, I found section 7 particularly familiar, where they describe how to turn informal requirements into formal requirements, which are then executed. I have written about this earlier here and here.

[2] The Mess we're in, Joe Armstrong
36 minutes into the video, Joe introduces the idea to use the hashes as references to values.

[3] The Log: What every software engineer should know about real-time data's unifying abtraction, Jay Kreps
Inspiring article about the usefulness of the log.

[4] Turning the Databse Inside-out with Apache Samsa, Martin Kleppman
More thoughts around the log and materialized views.

[5] http://syncthing.net/
File replication software.

[6] http://www.getsync.com/
File replication software.

2013-04-24

The Architecture of a Synthesizer Sound Editor

Summary

With extensive use of parsing and runtime code generation together with a relational data model, it is possible to reach a 20-fold reduction in code size. Patched, a Nord Modular patch editor, uses less than 5,000 lines of Python, PyMeta2, and various DSLs to do the same job as Nomad, another Nord Modular patch editor, which is using more than 100,000 lines of Java.

Introduction

Patched (patch-ed) is the most recent result of my 10+ year long project to reverse engineer and reimplement the Clavia Nord Modular patch editor. But to complete the cross platform replacement is not my only goal. I also want to investigate interesting programming techniques.


A common way to program is to use a generic programming language to interpret input and generate output, updating internal data structures in between. This approach often results in lots of code, like Nomad, another editor for the same synth, with more than 100,000 lines of Java. For Patched, I was inspired by the research of VPRI and its somewhat different approach which results in a drastic reduction of code size. As I don't have very much spare time in front of a keyboard to type lots of code, this sounded attractive. Short code is also easier to overview and share. At the center of the VPRI approach is the concise representation of meaning using domain specific languages and runtime code transformations (compilation) to the target platform.

Another common coding practice I wanted to avoid was the object oriented data model. It requires lots of API code and structures the data in ways that make multiple simultaneous access patterns cumbersome. The relational model is a better way to organize data. A major drawback is the weak integration of query languages with the generic programming languages in common use. Using the VPRI approach, a DSL can be tailored to have good support for embedded queries when necessary.

Patched can be downloaded as a tarball from CVS. Check the README file for installation instructions.

Problem Domain

A Nord Modular patch editor uses MIDI to communicate with the synth to get the current patch and make changes to it. The editor also needs to present the patch in a human comprehensible form and let the user update it. Finally, it has to be able to load and save patch files to share with others and for backup purposes. The MIDI protocol is given by the synth. The patch file format is given by the legacy patch editor. What is left is the user interface, and I have chosen to implement a text based console UI.

Pattern Matching and Parsing

Patched is written in Python and PyMeta2, an implementation of OMeta for Python. OMeta is a language for pattern matching and parsing which is quite easy to work with. I use it extensively throughout the codebase...
  • to parse the three DSLs for Protocol definition, View composition and Model construction.
  • as a code generation target for the Protocol definition DSL.
  • to parse the MIDI communication.
  • to parse user commands and generated queries.
  • to parse patch file format.
All parsing results in a translation from the source language to a target language, Python or OMeta, which can be interpreted by the platform.
  • Model construction -> Python
  • View composition -> Python
  • Protocol definition (message parser) -> OMeta
  • Protocol definition (message generator) -> Python
  • MIDI message -> Mixcode
  • Patch file format -> Mixcode
  • Mixcode -> Python

Model Construction Language

The DSL used for data model construction makes it possible to create simple tables and fill them with data. Variables can be bound to items in Python lists to generate a span of rows where there is a correlation between columns.

map (name string, value number, caption string)
map 'knob' x:range(0,18) 'k{}'.format(x+1)
map 'knob' 18 'pedal'
map 'knob' 19 'after touch'
map 'knob' 20 'on/off switch'

The code above creates the table map with three columns and then fill it with data. The parser will translate it to the following code before it is evaluated by the Python interpreter in the model.py context.

query("create table map (name string, value number, caption string)")
for x in range(0,18):
 query("insert into map values ('{}',{},'{}')".format('knob',x,'k{}'.format(x+1)))
query("insert into map values ('{}',{},'{}')".format('knob',18,'pedal'))
query("insert into map values ('{}',{},'{}')".format('knob',19,'after touch'))
query("insert into map values ('{}',{},'{}')".format('knob',20,'on/off switch'))

 
It is not a huge reduction in number of lines for this DSL, but the number of bytes is somewhat reduced. An in memory instance of sqlite3 is used as database engine. As much of the program state as possible is stored in the relational model. Certain data is also added to the relational model to make it possible to solve problems directly in SQL and avoid the use of Python for data processing.

Patched data model: modular.model 
Model compiler and API: model.py

View Composition Language 

View composition is done using a language for 2D text layout with tight SQL integration. Blocks of text can be arranged in columns or rows to form new blocks of text. A block will expand to align with the blocks it is composed with, to make the resulting composition rectangular. Each block is configured individually how it should grow to fit.

A literal block of text is surrounded by parenthesis.

block1 =
  ('A typical block of text '
   'with two -$lines'
   ' '                       )
 
The second line has an extension point marked with '$'. This means that if the line needs to be expanded, it will insert copies of the character right before, in this case '-'. When the line is long enough both characters ('-$') are removed. The first and third line has an implicit '$' at the end of the line, which means they will expand using ' ', and the last ' ' will be removed when the line is long enough. The third line is not normally displayed, as it is the line to use for vertical expansion. If the block is put next to another block with more lines, this block will grow with space filled lines until aligned. The last line is removed when the block is long enough.

block2 =
  ('1 ' '2 ' '3 ' '4 ' ' ')

block3 =
  [ block1 block2 ]

Above, block2 is introduced with four numbered lines. Then the two blocks are composed as a row (brackets) in block3 with block1 to the left of block2. If block3 is rendered, the result would look like this.

A typical block of text1
with two ---------lines2
                       3
                       4

Column composition is the default behaviour for blocks, but can be selected explicitly using curly braces. This can be used to create columns of blocks inside a row composition.

block4 =
{ block1
  block2 }

is the same as 

block4 =
  block1
  block2

and will render as

A typical block of text
with two lines
1
2
3
4
 
Here it is also apparent that lines are not aligned until a composition so requires. To get a right aligned rendering of block4, it must be composed with the minimal block (' ') placed on the right side.

block5 =
  [ block4 (' ') ]

will render as

A typical block of text
with two ---------lines
1
2
3
4

A block can be inserted for each row in a query result set, to dynamically add zero or more blocks to the composition. Variables are bound to the columns in the result set, which can be inserted into blocks of text and other queries.

block6 =
  ('datum: {datum} ' ' ')
  * "select [datum] from x where time={t}"

For each row that match in table x, [datum] will bind to the value and one instance of the literal block will be created and added to the column. {t} is substituted with the value of t, which must have been bound earlier in the dynamic scope. When rendered it would look something like this.

datum: 12
datum: 54
datum: 6
datum: 112

A view composition is translated to Python code and evaluated in the context of a relational data model and predefined variable bindings.

Patch rendering: patch.view
Memory bank rendering: bank.view
Help rendering: help.view
View compiler and API: view.py

Protocol Definition Language

Binary protocols may not respect the character quanta that OMeta works with. Therefore it is necessary to make each singe bit accessible by OMeta by expanding the character vector into a bit vector. If [0x15, 0x7f] is a 7 bit per byte MIDI Sysex message, it will be expanded to the 14 bits [0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1] before it is processed by OMeta.

The protocol definition syntax is then further simplified to be even shorter than what can be achieved with OMeta directly. A protocol definition is translated by OMeta to OMeta before messages are parsed. The same definition can also be translated by OMeta to Python, to make it possible to generate messages.

IAmPC
  0x33:7 0x00:5 slot:2 0x06:7
  0x00:7 versionHigh:7 versionLow:7;


IAmPC is the first message Patched sends to the synth to establish contact. '0x33:7' is a constant value represented by seven bits. 'slot:2' is a variable represented by two bits. To generate the binary representation of the message, one value per variable must be supplied to the generator.

generate('IAmPC', [0, 3, 3])
=> [0x33, 0x00, 0x06, 0x00, 0x03, 0x03]

The response to this message will be parsed with the following message definition.

IAmNM
  0x33:7 0x00:5 slot:2 0x06:7
  0x01:7 versionHigh:7 versionLow:7
  serial:21 deviceId:7
  @[
  "print 'Modular {} model {} v{}.{} connected.'".format(serial, deviceId, versionHigh, versionLow),
  "insert into synth (serial) values ({})".format(serial),
  "ack({})".format(serial)
  ];


After @ follows a Python list with Mixcode statements. The message parser will return a dictionary, where the key 'actions' is bound to this list. It will later be evaluated in the modular.py context.

The pad operator (%) will make sure the length of a message is evenly divisible with the given number, and pad with 0 to make it so.

It is possible to nest message definitions and also pass parameters when doing so.

The minus operator (-) will exclude the definition when searching for a match with a complete message. The definition can only be used as part of another message.

The multiplication operator (*) makes it possible to say how many times a definition should be repeated in sequence. The slash operator (/) can make the repetition terminate when the value specified after it is found.

Patch
  0x33:7 0x07:3 last:1 first:1 slot:2 0x06:7
  command:1 pid:6 paddedpart:Paddedpart(slot pid) checksum:7;
 

-Paddedpart(:slot :pid)%7
  partid:8 part:Patchpart(slot pid partid);


-Patchpart(:slot :pid 90)%8
  section:1 nmodules:7 names:Name*nmodules
  @[
  'delete from name where slot={} and section={}'.format(slot, section),
  'update name set slot={}, section={} where slot=-1'.format(slot, section),
  'delete from name where section=2',
  "insert into name (slot, section, indx, name) values ({}, 2, 1, 'MORPH')".format(slot),
  "tx('GetPatchPart2', {})".format([slot, pid] + [[0x4f, 0x01], [0x4e, 0x00]][section] + [0])
  ];


-Name
  indx:8 name:8*16/0
  @[
  "insert into name (slot, section, indx, name) values (-1,-1,{},'{}')".format(indx, ''.join([chrn(y) for y in name]))
  ];



Common protocol definitions: patch.protocol 
Host protocol definition: host.protocol
Modular protocol definition: modular.protocol, patch2.protocol
Protocol parser/generator and API: protocol.py

Mixcode

Mixcode is a mix of Python, SQL and user commands that is translated to pure Python before it is evaluated in the modular.py context.

Similar to the View Composition Language, it is possible to iterate over result sets and bind columns to Python variables. The following Mixcode

["select [datum] from x where time={t}",
 " print datum"                        ]

will compile to

for (datum,) in model.query("select datum from x where time={}".format(t)):
 print datum

User commands start with '#'. The command frontend just appends a '#' in front of user input to conform with Mixcode syntax. Everything else in Mixcode is assumed to be pure Python.

MIDI messages generated from user commands that are sent to the synth to update the patch state are usually injected back into the MIDI message parser, to also have them update the local state. This means user commands don't have to duplicate the database updates necessary to keep the local state in sync with the synth.

User commands can call other user commands by invoking the Mixcode interpreter call eval_mixcode() in the code they emit.

The patchfile compiler is used by the Mixcode compiler to generate code for the import command.

Mixcode compiler: command.py

Patchfile compiler

An OMeta parser is used to read pch files. The file is translated to MIDI messages which are transmitted to the synth and then read back from the synth to update the user view.

Patchfile compiler: patchfile.py

Patchfile writer

A View Composition specification is used to render a patch in the relational model to the pch file format.

Patchfile writer: patchfile.view

modular.py - the integration point

All components are tied together in modular.py, which interprets instructions written in Mixcode generated by the components in the system. The relational model, the MIDI protocol compiler/generator, the patchfile writer and the Mixcode compiler are all created by modular.py and can be used by Mixcode.

Distilled, there are a few fundamental operations available for Mixcode.
  • Transmit message to synth.
  • Wait for ack message.
  • Loopback generated MIDI message to parser for local state updates.
  • Update the relational model.
  • Render patch and bank view to terminal.
  • Render patchfile view to file.
  • Interpret more Mixcode.
Below is a flowchart that illustrates the flow of data/code in Patched. Code is data and data is code.



And here is a simplified component diagram.



Mixcode interpreter: modular.py

Simplifications Made Possible by the Nord Modular Deisgn

In the protocol and in the pch file format, a morph group is referenced as a parameter 0-3 in a module with index 1 located in module section 2. Instead of handling this as a special case, the morph module is added to the relational model as an ordinary module. This makes it possible to apply rendering code for modules in general to the morph groups.

The command 'new' is used to fill a slot with an empty patch. This is just syntactic sugar for 'import new.pch'. To treat 'new' as an ordinary patch import means we don't have to write any special code to initialize the empty patch. It is also possible for the user to 'export new.pch' to make changes to the initial state.

Observations

The API disappears (almost). Traditionally, I have spent a lot of time writing and calling APIs. Not so when using runtime code generation. The code generator can skip over the API, and generate highly expressive code directly in the target language. A little API is required to integrate the various components like sqlite3 and Port Midi, to reach through to the metal in an efficient manner. Each parser also needs a compile or interpret call, and that's it.

You don't need to go all the way to a completely modifiable metacircular programming system to make good use of runtime code generation. Patched is a traditional write-compile-run application.

The code is compact, but not incomprehensible. Everything is there right in front of you at the right level of abstraction when it needs  to be modified. Code written in a generic programming language tends to have much more accidental complexity, imposed by the generality of the language.

Code is data - Code can be constructed and manipulated just like data.
Data is code - All data is encoded in some language that can be compiled and/or interpreted to manipulate the system state.

2011-09-20

PyMeta2 Example

I found it difficult to find examples on the use of PyMeta2, an implememtation of the OMeta language in Python. I mamaged to put together a small example myself on how to parse a PGP public key block and extract the DSA key.

pgp.py:

from pymeta.grammar import OMeta

# Base 64 decoder, see RFC 2045.
grammar = """
code = letter:x ?(x <= 'Z') -> (ord(x) - ord('A'))
| letter:x ?(x <= 'z') -> (ord(x) - ord('a') + 26)
| digit:y -> (ord(y) - ord('0') + 52)
| '+' -> (62)
| '/' -> (63)
sixbits = spaces code:z -> (z)
| code:z -> (z)
pad = spaces '='
| '='
group = sixbits:a sixbits:b sixbits:c sixbits:d
-> ([(a<<2)+(b>>4), ((b&0xf)<<4)+(c>>2), ((c&0x3)<<6)+d])
| sixbits:a sixbits:b sixbits:c pad
-> ([(a<<2)+(b>>4), ((b&0xf)<<4)+(c>>2)])
| sixbits:a sixbits:b pad pad
-> ([(a<<2)+(b>>4)])
data = data:s group:z -> (s + z)
| group:z -> (z)
packet = data:d spaces '=' data:c spaces end -> ({'data': d, 'checksum': c})
"""

parser = OMeta.makeGrammar(grammar, {})

# Data from PGP PUBLIC KEY BLOCK
code = """
mQGiBE5cxWkRBACgm0DoNQj5HTTcQLL/eooKYMJTmYiINk0bSfQ46+R1iQliL/67
K0MTUZltFv3z70ATY9HulMgmlutDsFRGG4yUWceQX0qNBbJv2Q7FzB2rAFd/XfHT
tB8EOYc1fZk2RCDk0m5Ef33kwVXMm7/+MyvJ4GWxd8rvEsvrLdVZGufZ1wCg+K9p
Qgm542nvRI8LZN91yZGEiMcD/Rv6+oYykhOfU4ozxpOv2CLjzF3qk+D8TvVFEDrE
yjfoZZBGBHWY8bWyMsWONjk3o2DFSwQ5qwhz/bgcUtTPj8qU9K8KyrLB1mmiRTj1
+6svwUORoOCqKmTXkna7bmzzdlsqRQR2g9AdMHuIDkynyE7PBEYO6r7roUZZVtHU
TIRiBACRiV9ezsDsKRNfFBvoI5Lip7aA81E3+aesDGvGLqCeje3NIJfZ5q0q02iE
WTkNduAwxjcP+Q6QnakCoTSK7tlyiFWTw+bc0fq4V72hey+PqBclk8gRFDoCSuCF
vJOty98GfATei8fBJIpL4+hLNSGbN6QIKWtakS3IorWVXfeqHrQHVGFjdGl2b4hg
BBMRAgAgBQJOXMVpAhsDBgsJCAcDAgQVAggDBBYCAwECHgECF4AACgkQ9t723H2z
gCa0hgCcCH6NcL6fBrf3uHckMAFvPLub0SYAn2HeLhGECfI8pJ06S7aT/oPX0tyU
uQENBE5cxWkQBACGpC50KVpoy+4QmyIUHtSVQ4gbfeGOMmHOZlRRs41t93iqdf8F
y0aDUiK1G4AbnpHgJL/IFfH2HUPn1v7A4zRP9kd14k7+V5hjtiFj6m+USz/9M2Cd
CCWC0b7JAwK0egCDoGfdDPZBXF73ec5ObntinBg+YqBy7GDMCsFgZ8fjKwADBgP9
HUCsOp+BikAKHErqi5p73vRkzGq4kLs32ixwWP/6F2wcR2mJlgR7ET0gopvlkx9A
fDptYO4TG0nNVtBF/GqEbUIBcEfvpDVtVgstW25Aj367x06ySOWD1A/7No3j2YY8
bXPRxTDmUpyJNhcFDauaSClffI75UbYTMreuG9JbiYuISQQYEQIACQUCTlzFaQIb
DAAKCRD23vbcfbOAJqBvAKDocpghK+6fgmDiz6NDnWsnjKHOLgCfVuO99xoUFd+b
IWBs+tbQvkoRK8U=
=9tpq
"""

packet = parser(code).apply("packet")


# Checksum calculator
def crc24(octets):
INIT = 0xB704CE
POLY = 0x1864CFB
crc = INIT
for octet in octets:
crc ^= (octet << 16)
for i in xrange(8):
crc <<= 1
if crc & 0x1000000: crc ^= POLY
return crc & 0xFFFFFF

# Compare values in checksum manually
checksum = list()

# Checksum in code
checksum.append(packet[0]['checksum'][0]*256*256 + packet[0]['checksum'][1]*256 + packet[0]['checksum'][2])

# Calculated checksum
checksum.append(crc24(packet[0]['data']))


# DSA public-key packet parser, see RFC 4880
blockgrammar = """
number 0 = -> (0)
number :m = number(m-1):msb anything:lsb -> (msb*256+lsb)

length 0 = number(1):l -> (l)
length 1 = number(2):l -> (l)
length 2 = number(4):l -> (l)

mpi = number(2):bits number((bits+7)/8):i -> ([bits, i])

block = anything:tag length(tag&0x03):l anything:ver number(4):t anything:algo mpi:p mpi:q mpi:g mpi:y
-> ({'tag': (tag&0x7f)>>2, 'length': l, 'version': ver, 'time': t, 'algorithm': algo, 'p': p, 'q': q, 'g': g, 'y': y})
"""

blockparser = OMeta.makeGrammar(blockgrammar, {})

block = blockparser(packet[0]['data']).apply("block")


A session using this code would look something like this:

Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pgp
>>> pgp.packet
({'checksum': [246, 218, 106], 'data': [153, 1, 162, 4, 78, 92, 197, 105, 17, 4,
0, 160, 155, 64, 232, 53, 8, 249, 29, 52, 220, 64, 178, 255, 122, 138, 10, 96,
194, 83, 153, 136, 136, 54, 77, 27, 73, 244, 56, 235, 228, 117, 137, 9, 98, 47,
254, 187, 43, 67, 19, 81, 153, 109, 22, 253, 243, 239, 64, 19, 99, 209, 238, 148
, 200, 38, 150, 235, 67, 176, 84, 70, 27, 140, 148, 89, 199, 144, 95, 74, 141, 5
, 178, 111, 217, 14, 197, 204, 29, 171, 0, 87, 127, 93, 241, 211, 180, 31, 4, 57
, 135, 53, 125, 153, 54, 68, 32, 228, 210, 110, 68, 127, 125, 228, 193, 85, 204,
155, 191, 254, 51, 43, 201, 224, 101, 177, 119, 202, 239, 18, 203, 235, 45, 213
, 89, 26, 231, 217, 215, 0, 160, 248, 175, 105, 66, 9, 185, 227, 105, 239, 68, 1
43, 11, 100, 223, 117, 201, 145, 132, 136, 199, 3, 253, 27, 250, 250, 134, 50, 1
46, 19, 159, 83, 138, 51, 198, 147, 175, 216, 34, 227, 204, 93, 234, 147, 224, 2
52, 78, 245, 69, 16, 58, 196, 202, 55, 232, 101, 144, 70, 4, 117, 152, 241, 181,
178, 50, 197, 142, 54, 57, 55, 163, 96, 197, 75, 4, 57, 171, 8, 115, 253, 184,
28, 82, 212, 207, 143, 202, 148, 244, 175, 10, 202, 178, 193, 214, 105, 162, 69,
56, 245, 251, 171, 47, 193, 67, 145, 160, 224, 170, 42, 100, 215, 146, 118, 187
, 110, 108, 243, 118, 91, 42, 69, 4, 118, 131, 208, 29, 48, 123, 136, 14, 76, 16
7, 200, 78, 207, 4, 70, 14, 234, 190, 235, 161, 70, 89, 86, 209, 212, 76, 132, 9
8, 4, 0, 145, 137, 95, 94, 206, 192, 236, 41, 19, 95, 20, 27, 232, 35, 146, 226,
167, 182, 128, 243, 81, 55, 249, 167, 172, 12, 107, 198, 46, 160, 158, 141, 237
, 205, 32, 151, 217, 230, 173, 42, 211, 104, 132, 89, 57, 13, 118, 224, 48, 198,
55, 15, 249, 14, 144, 157, 169, 2, 161, 52, 138, 238, 217, 114, 136, 85, 147, 1
95, 230, 220, 209, 250, 184, 87, 189, 161, 123, 47, 143, 168, 23, 37, 147, 200,
17, 20, 58, 2, 74, 224, 133, 188, 147, 173, 203, 223, 6, 124, 4, 222, 139, 199,
193, 36, 138, 75, 227, 232, 75, 53, 33, 155, 55, 164, 8, 41, 107, 90, 145, 45, 2
00, 162, 181, 149, 93, 247, 170, 30, 180, 7, 84, 97, 99, 116, 105, 118, 111, 136
, 96, 4, 19, 17, 2, 0, 32, 5, 2, 78, 92, 197, 105, 2, 27, 3, 6, 11, 9, 8, 7, 3,
2, 4, 21, 2, 8, 3, 4, 22, 2, 3, 1, 2, 30, 1, 2, 23, 128, 0, 10, 9, 16, 246, 222,
246, 220, 125, 179, 128, 38, 180, 134, 0, 156, 8, 126, 141, 112, 190, 159, 6, 1
83, 247, 184, 119, 36, 48, 1, 111, 60, 187, 155, 209, 38, 0, 159, 97, 222, 46, 1
7, 132, 9, 242, 60, 164, 157, 58, 75, 182, 147, 254, 131, 215, 210, 220, 148, 18
5, 1, 13, 4, 78, 92, 197, 105, 16, 4, 0, 134, 164, 46, 116, 41, 90, 104, 203, 23
8, 16, 155, 34, 20, 30, 212, 149, 67, 136, 27, 125, 225, 142, 50, 97, 206, 102,
84, 81, 179, 141, 109, 247, 120, 170, 117, 255, 5, 203, 70, 131, 82, 34, 181, 27
, 128, 27, 158, 145, 224, 36, 191, 200, 21, 241, 246, 29, 67, 231, 214, 254, 192
, 227, 52, 79, 246, 71, 117, 226, 78, 254, 87, 152, 99, 182, 33, 99, 234, 111, 1
48, 75, 63, 253, 51, 96, 157, 8, 37, 130, 209, 190, 201, 3, 2, 180, 122, 0, 131,
160, 103, 221, 12, 246, 65, 92, 94, 247, 121, 206, 78, 110, 123, 98, 156, 24, 6
2, 98, 160, 114, 236, 96, 204, 10, 193, 96, 103, 199, 227, 43, 0, 3, 6, 3, 253,
29, 64, 172, 58, 159, 129, 138, 64, 10, 28, 74, 234, 139, 154, 123, 222, 244, 10
0, 204, 106, 184, 144, 187, 55, 218, 44, 112, 88, 255, 250, 23, 108, 28, 71, 105
, 137, 150, 4, 123, 17, 61, 32, 162, 155, 229, 147, 31, 64, 124, 58, 109, 96, 23
8, 19, 27, 73, 205, 86, 208, 69, 252, 106, 132, 109, 66, 1, 112, 71, 239, 164, 5
3, 109, 86, 11, 45, 91, 110, 64, 143, 126, 187, 199, 78, 178, 72, 229, 131, 212,
15, 251, 54, 141, 227, 217, 134, 60, 109, 115, 209, 197, 48, 230, 82, 156, 137,
54, 23, 5, 13, 171, 154, 72, 41, 95, 124, 142, 249, 81, 182, 19, 50, 183, 174,
27, 210, 91, 137, 139, 136, 73, 4, 24, 17, 2, 0, 9, 5, 2, 78, 92, 197, 105, 2, 2
7, 12, 0, 10, 9, 16, 246, 222, 246, 220, 125, 179, 128, 38, 160, 111, 0, 160, 23
2, 114, 152, 33, 43, 238, 159, 130, 96, 226, 207, 163, 67, 157, 107, 39, 140, 16
1, 206, 46, 0, 159, 86, 227, 189, 247, 26, 20, 21, 223, 155, 33, 96, 108, 250, 2
14, 208, 190, 74, 17, 43, 197]}, _MaybeParseError(1194, [('message', 'end of inp
ut')]))
>>> pgp.checksum
[16177770, 16177770]
>>> pgp.block
({'q': [160, 1419741510835642352048304995189763742904405493959L], 'p': [1024, 11
27816910289527974061075044173534361876859840537664278626449706640702688418029301
77627320915504464134384919932726006749358191240819384430019080661316770630666273
73305307310915789743932633603785654698308793554266637396727988530315573294163969
5505278895830548483021403317423418546610032441185543867874125797847L], 'length':
418, 'tag': 6, 'g': [1021, 1964849467881428710466800502308496822275241438855066
35238515330556083042068220950919284412399840170713771945402998891574208390341039
32135373724979642618657215923305914952852917206440714185965098155864193815365920
66817031665349355835965700904192728740918402151294857530330803724346461299359845
6219939378070626L], 'version': 4, 'algorithm': 17, 'time': 1314702697, 'y': [102
4, 10219928411694978026422720720306614004007482190710640189044175475269784538174
10827189912669049493898343302013241201652486175229135262490562087262759959706358
79842784809961205983751216640065755418899948380797160593911267435665817196921472
667773602697266619313590993808832086874091786796761538144131865973860894L]}, _Ma
ybeParseError(293, [('expected', None, 0)]))
>>>

2011-09-16

Compass Belt

So, I made my own compass belt, like many others. I was inspired by the Wired article published a couple of years ago.



The core parts are a HMC6352 compass module connected to a 3.3V Arduino Pro Mini. A ULN2803 with eight Darlington drivers supply power to the vibrators positioned at even intervals around the belt. Everything is powered by a 9V battery.



I found some sample code to read the bearing from the compass, and then added the vibrator control myself.


#include <Wire.h>

int compassAddress = 0x42 >> 1; // From datasheet compass address is 0x42
// shift the address 1 bit right, the Wire library only needs the 7
// most significant bits for the address
int reading = 0;

int bearing = 0;
int offset = 10;

void setup()
{
Wire.begin(); // join i2c bus (address optional for master)
Serial.begin(9600); // start serial communication at 9600bps
pinMode(13, OUTPUT);
digitalWrite(13, HIGH);

pinMode(2, OUTPUT);
pinMode(3, OUTPUT);
pinMode(4, OUTPUT);
pinMode(5, OUTPUT);
pinMode(6, OUTPUT);
pinMode(7, OUTPUT);
pinMode(8, OUTPUT);
pinMode(9, OUTPUT);
}

void loop()
{
// step 1: instruct sensor to read echoes
Wire.beginTransmission(compassAddress); // transmit to device
// the address specified in the datasheet is 66 (0x42)
// but i2c adressing uses the high 7 bits so it's 33
Wire.send('A'); // command sensor to measure angle
Wire.endTransmission(); // stop transmitting

// step 2: wait for readings to happen
delay(10); // datasheet suggests at least 6000 microseconds

// step 3: request reading from sensor
Wire.requestFrom(compassAddress, 2); // request 2 bytes from slave device #33

// step 4: receive reading from sensor
if (2 <= Wire.available()) // if two bytes were received
{
reading = Wire.receive(); // receive high byte (overwrites previous reading)
reading = reading << 8; // shift high byte to be high 8 bits
reading += Wire.receive(); // receive low byte as lower 8 bits
reading /= 10;
Serial.println(reading); // print the reading
}

// Activate vibrators
offset = -offset;
bearing = reading + 180 + offset;
if (bearing > 359) bearing -= 360;
if (bearing < 0) bearing += 360;

digitalWrite(9, bearing >= 0 && bearing < 45 ? HIGH : LOW);
digitalWrite(8, bearing >= 45 && bearing < 90 ? HIGH : LOW);
digitalWrite(7, bearing >= 90 && bearing < 135 ? HIGH : LOW);
digitalWrite(6, bearing >= 135 && bearing < 180 ? HIGH : LOW);
digitalWrite(5, bearing >= 180 && bearing < 225 ? HIGH : LOW);
digitalWrite(4, bearing >= 225 && bearing < 270 ? HIGH : LOW);
digitalWrite(3, bearing >= 270 && bearing < 315 ? HIGH : LOW);
digitalWrite(2, bearing >= 315 && bearing < 360 ? HIGH : LOW);

delay(10);

digitalWrite(9, LOW);
digitalWrite(8, LOW);
digitalWrite(7, LOW);
digitalWrite(6, LOW);
digitalWrite(5, LOW);
digitalWrite(4, LOW);
digitalWrite(3, LOW);
digitalWrite(2, LOW);
}


The vibrators are glued on velcro, to be easily adjustable for various waist sizes. The cables are nicely embedded in velcro as well.

One of my friends advised me not to go on the subway with this thing, unless I wanted to get arrested. May you live in interesting times.

2011-05-19

Albumplayer - A MPD Web Client

I have written a simple web client for the Music Player Daemon (MPD). It is suitable to use on mobile devices, like my Android phone. It allows me to select artists from my music collection and play their music on my headless media frontend.

To do anything useful, Albumplayer requires an MPD instance. Install that first. Then, extract albumplayer.tgz where you want it located on your web server. Edit mpd.expect to connect to your MPD instance. (Change the IP address.) Albumplayer is written in bash, expect, grep, sed, awk, Python and PHP. Make sure you have them installed on your system before use.

I tried a couple of lightweight web clients before I wrote Albumplayer, but all seemed to require that I first defined playlists in MPD, which then could be played in the client. That is not how I want it to work. I want to select artists directly from the database and play their music in a straightforward manner.

Albumplayer is completely insecure. Don't run it on a public web server.

Update DB will not work in the general case, when it fails to finish before Expect terminates the connection. An opportunity for improvement.

Enjoy your music.