Blogging about our lives online.


px Language

In my studies with assembly language, a couple of things have occurred to me: 1. Assembly syntax is SIMPLE. 2. Instructions, registers, stack pointers, etc. are incredibly confusing. What i mean by the syntax being simple, is that it is completely linear. If the code jumps around it explicitly tells you where it's jumping and why. All lines can be understood as self-contained statements and the only structure to the program is the structure you give it. With this in mind, I have set out to design a new language based on this syntax. The language will be functional and type-safe like Haskell, but it won't have ANY built in syntax. I have begun to build a parser in Python which does the px to asm conversion. It's very simple at this stage, but should work. The language is read, for the most part, right to left with the leftmost term(s) being the terminal element. The only exception to the rule might be infix operators, because I'm not sure if many would take up a language if they had to learn reverse-polish notation. The Basics Here's a simple example of naming symbols and comparing them.
x, y  20, 15
z 42
z x if x > y
z y if y > z
print z
x, y and z are assigned the values 20, 15 and 42. y and x are pushed onto the stack and '>' pops them and compares them, putting a 0 or 1 on the stack. The if statement takes two operands, the x and the boolean result of the '>' operation. It then pushes onto the stack x and [0/1] (secret rule #1). Secret rule#1: Values are always stored in two parts, a value and a flag. If the flag is false the assignment doesn't happen. I have chose to omit the assignment operator, because all evaluations logically flow right to left. You will see the benefit of using infix operators if you reduce this operation to it's RPN form, with parens added for clarity: z (if ( x (> y x))) The > and other mathematical terms can be understood with some practice in RPN, but the if statement is not very intuitive at all without infix notation. Another curiosity that needs addressing is that since this is stack-based assignment, the first assignment "x,y 20,15" just adds the numbers 15 and 20 onto the stack and then pops them off making y=20 and x = 15. Not the preferred functionality! Functions Simple enough for basic expressions, how do we implement more complex functions? I've carried the same logic into function expressions. Functions always produce the left-hand value based on the right-hand input, but you add a code block below the statement line. This is how it looks in practice:
x 15
y 12
.x some_function .x, .y:
   .x sum(.x, .y)
print x
I almost went with the Python-esque whitespace-significance, but have decided for now to close blocks with a final colon. The function's header defines it's inputs and outputs. The function takes two values, x and y, passes them through the function block and produces x. There are no return statements, because the output symbol is explicitly given in the header line. Okay, the syntax is simple enough, but what can you do with it, and what are the leading periods all about? That's where namespaces come into play... Namespaces Each block defines it's own namespace. The previous function declared inputs of .x and .y. These are concrete references to the parent namespace. The output was also a concrete reference, so the function was fully defined and executed to produce the new value of x. Here's an example that, although it looks very similar, is actually quite different:
x 15
y 12
m generic_function m, n:
  m sum(m, n)
x generic_function x, y
print x
This is more like normal function you would use, because it defines the function with generic inputs and outputs and then you use them on specific values. The function is only executed when called with some specific values. This opens the door for partially applied functions and the like. eg:
m part_func .x, m:
   m sum(.x, m)
y part_func y

m const_function .x, .y:
  print x, y
  m True
res const_function

.x defined_out m, n:
   .x sum(m, n)
defined_out 8 1024
print x
This is a simple but powerful way to implement a lot of the functionality of high level language with only one syntactic convention. One final note about namespaces is that the .* symbol references the entire parent namespace. This allows one to implement transparent code blocks like loops. Eg:
x 0
x_max 11
.* loop_block .*:
   continue .x < .x_max     print .x     .x .x + 1     loop True     : 
Because the input and output are fully defined, the code executes and can access any of the variables, functions etc. of the parent block via the dot syntax. The loop and continue functions control the flow of the block and can be used in any block. 'loop' is just a "goto $start if expression is true" statement and 'continue' is a "goto $end if expression is false" statement. This block first checks it's condition, breaking if it is false, does some more stuff and then loops back to the start unconditionally. Alternately, one might want to omit the 'continue' statement and do the condition check with the 'loop' statement. Every block will have a start and end tag that can be used in this manner. Objects Because blocks are self-contained namespaces, there is another possible use for them, classes and objects.
. some_object .:
   name "some_object"
   value 42
This block didn't specify any inputs or outputs. It is thus a concrete object which can be used as follows:
But maybe I'm getting ahead of myself! Stay tuned for more developments and please give suggestions on the name, I'm not sure how or why I came up with "px".

No comments:

Post a Comment