Formal Language: Makron - template notation and evaluator

Makron (swedish for macaron) is the programming language used for text-based code generation in the Biotope application server. It is a kind of template or string processing language. It is typically used for server side generation of HTML code, but it is also suitable for any text based target languages like Graphviz, JSON, CSS and Javascript. The current implementation of the interpreter is tightly coupled with the relational model described in A Distributed and Relational Data Model.

Execution starts by invoking a well known template that is stored as a Code entity in the relational model and ends when the template is completely evaluated. A Makron template has access to the invocation URL and its parameters, can iterate over the result from a query to the relational model and can download content from the net and use all of these sources when composing its text-based result. Conditional execution is driven by regex matching, which also handles the limited parsing that is possible. A future goal is to include a proper parser generator. A Makron program is typically responsible for creating the response to GET requests to a Biotope servlet.

A Makron program cannot directly modify the relational data model. Instead, it is the JS/HTML application generated by the Makron evaluator that can modify the model through the Logger and Archiver servlets in Biotope.

High Level Language Overview

Everything is UTF-8 encoded text.
A string of text can be bound to a symbol. A symbol can either represent data or template code.
A string of text can be evaluated, which means that template invocations that are found in the text are substituted with the result of the recursive evaluation of the template (in a new local symbol scope). Templates are invoked using a language construct called a macaron, e.g. {template_name.arg1.arg2.arg3}. The rest of the text being evaluated is left unchanged.
Macaron start and end chars and argument delimiter chars are dynamically exchangeable to easily adapt to the properties of the target language.
Template arguments can either be evaluated in the invocation context or in the template context. The template itself explicitly performs eager, lazy or no evaluation of its arguments.

Hello, world!

A trivial program with no template invocations (macarons) will just output itself as the result.

Hello, world!

>>> Hello, world!

Before we get to more advanced evaluation rules we need some definitions.

Symbols

A symbol name can use the characters a-z, A-Z, 0-9, _ and -.

This_is_a_Symbol

Macarons

The default macaron start and end chars are '{' and '}'. It is a macaron that triggers template invocation and all characters in the macaron are part of the invocation.

{This is a macaron}

It is possible to change the macaron chars dynamically in the source by putting a macaron framed with new macaron chars directly inside a macaron. Valid macaron chars are '(', ')', '{', '}', '[', ']', and '<', '>'.

{(Parenthesis are now used as macaron chars within this macaron.)}

The change is tied to the lexical scope and does not affect templates that are invoked from within the macaron.

Escaped Chars

The escape char '\' can be used to prevent the normal interpretation of characters with special meaning, for example if you want to type a macaron char but don't want to start or end a macaron. It can also be used to insert newline '\n' and tab '\t'. A backslash is written '\\'.

Delimiter Chars

A delimiter char is a character not part of any of the other categories above. It is used to separate template name and the arguments in a macaron.

Now we have what we need to go on with template evaluation.

Template Evaluation

During evaluation, the template source code is read character by character and they are typically passed on unprocessed to the output stream. Escaped characters are passed on to the output stream without the escape char '\'.

Plain \{text\}

>>> Plain {text}

Template Invocation

When a macaron is encountered it is not passed on to the output stream. Instead, template invocation is initiated. First, a template name followed by a delimiter char is scanned for. The process is a form of evaluation and other templates can be invoked during the scan. When a delimiter char is found, evaluation stops and the rest of the unprocessed input stream up to the end of the macaron is treated as the template argument. The source code of the new template is loaded and evaluated in a new scope with the variables self, '.', body, start and end bound to new values.

{template/with/{argument}/body}

Bindings in new evaluation scope:

self = template
. = /
body = with/{argument}/body
start = {
end = }

Function: = (let)

Makron is dynamically scoped with all the usual drawbacks, but don't dismiss this language just yet. For small programs implementing view generation, it is quite convenient to not have to specify all necessary parameters in every template invocation. Symbols are semi-global, i.e. global in its local execution branch. The code structure with one template per file also doesn't make static scope very useful.

A symbol always refer to a stream of UTF-8 encoded text. Symbols can be defined in the local scope using the built-in function let. It will evaluate both of its arguments before binding the first argument, the symbol name, to the second argument, the value.

{let.fn1.Hello, world!}

or

{=fn1=Hello, world!}

The content of the symbol fn1 can then be evaluated, i.e. a template invocation.

Makron says "{fn1}"

>>> Makron says "Hello, world!"

Template: ' (quote)

We will now define a second symbol containing code that calls fn1.

{=fn2={'{fn1}}}

The argument to let is always evaluated, therefore we have to prevent the premature evaluation of {fn1} with the quote template. We could also escape the bracket or temporarily changed the macaron chars with the same effect.

{let.fn2.\{fn1\}}
{let.fn2.{( {fn1})}}

The result when evaluating fn2 is the same, as expected.

Makron still says "{fn2}"

>>> Makron still says "Hello, world!"

Function: $ (value)

If we instead want to get the content of the symbol fn2 without evaluating it, we can use the built-in function $.

{$$fn2}

>>> {fn1}

The reason for the double $$ is that $ takes two arguments and we have given an empty first argument. It is very common that strings need to be escaped in different ways when inserting them in program code. Therefore the $ function takes a first argument that specify what kind of escaping that needs to be applied before output. A number of escape schemes are defined, e.g. qoute, squote, html, url.

{$url$fn2}

>>> %7Bfn1%7D

Function: [space] (identity)

As we saw above we can use the space function as a function that just evaluates and outputs its argument. This is for example useful when we want to change macaron chars but don't want to invoke a template as the first thing.

{( {fn1})}

>>> {fn1}

Functions: first and rest

If we break down the anatomy of a template invocation we can see that it consists of a template name followed by a delimiter char and the rest is the argument. Yes, always a single argument. The template itself needs to break up the single argument into parts if it needs multiple arguments. To be able to do this it can access the delimiter char that was used in the invocation with the template '.', i.e. {.}. The unevaluated argument itself is bound to the variable body.

There are two helper functions first and rest that are useful when splitting up body. first returns the part before the first delimiter char and rest returns the part after the first delimiter char.

Function: @ (arg)

{arg.a1.{first{.}{$$body}}}
{arg.a2.{first{.}{rest{.}{$$body}}}}
{@a3@{rest{.}{rest{.}{$$body}}}}

The function arg works like let with the difference that it evaluates the second argument twice, first in the local scope and the second time in the scope of the caller. This is the mechanism that turns the template into a function with eager evaluation of its arguments.

By convention, the last argument is assigned to what is left without looking for the delimiter char, i.e. no call to first. This means that the last part of body can include the delimiter char as data without the need to escape it.

Normally, a function splits its argument at the delimiter chars and then evaluates the parts separately. This allows variables containing arbitrary strings to be passed as arguments. If the argument is first evaluated as a whole and then split at the delimiter, the arbitrary strings that are inserted into the argument may have introduced unwanted delimiter or macaron chars that will ruin the intended argument separation. first and rest are different though. They will first evaluate their argument and then look for the delimiter char.

Function: ~ (eval)

Sometimes it can be handy to be able to evaluate a string as a template without first bind it to a symbol. The function eval will evaluate its argument and then evaluate the resulting string once more.

Function: ^ (upeval)

The upeval function is a mix of eval and arg. It will evaluate its argument and then evaluate the resulting string in the scope where the current template was invoked.

Function: ? (query)

To get data from the relational model, the function query is used. The function evaluates the first argument and sends it as a sql query to the database. The second argument, when evaluated, is the code that will be evaluated for each row in the query result. The third argument, when evaluated, is the code that will be evaluated between each row. Brackets in the query will mark the symbol name to bind to each column in the result.

{query|SELECT s.n AS [num] FROM (VALUES(1),(2),(3),(4),(5)) s(n)|{'{$$num}}|, }

>>> 1, 2, 3, 4, 5

Function: % (regex)

Using regular expressions, it is possible to conditionally evaluate Makron code. When the regex matches a string, one code path is evaluated. If it doesn't match, another code path is evaluated. The whole input string must match, not just a part of it. The regex can also be used as a simple parser as regex groups can be bound to variables in Django style.

Pre-Defined Symbols

When the Makron evaluator is running in a servlet context we very much like to get the URL and its parameters to be able to respond accordingly. The following symbols are pre-defined in such a context.

contexturl - Contains the URL to the servlet engine.
rooturl - Contains the ULR to the servlet.
url - Contains the part of the URL after rooturl. Always starts with '/'.
p_xyz - Contains the value of the URL parameter xyz.

Symbols in the Relational Data Model

I mentioned earlier that the first function that was invoked would be retrieved from the relational data model. Symbols stored in the relational data model are available in the outermost scope. A symbol is represented by an entity in the relation code10c1. It is associated with a name through the chain code10c1-name11c11-text_value. The contents is stored in a web archive with an URL found through the chain code10c1-data11cn1-location02cn1-text_value.

Function: location

For symbols stored in the relational data model, it is possible to get the URL where the template data is stored.

<script src="{location.jquery}" type="text/javascript"></script>

>>>
<script src="https://kornet1.se/archive/biotope/95a5d6b46c9da70a89f0903e5fdc769a2c266a22a19fcb5598e5448a044db4fe" type="text/javascript"></script>

Undefined Templates

Macarons that invoke an undefined template are ignored and can be used as source code comments.

Arithmetic Operations

Ongoing...

Input Qualification