amXor

Blogging about our lives online.

2.11.2010

px Language - Parsing Challenges

After spending most of the day investigating parsers, lexers and compiler compilers, I've come to the conclusion that my language doesn't really need these just yet. I've really only outlined three syntactic rules and two structural ones, and these cover all the ground I can see in front of me.

Syntax

1. Right to left evaluation, except inside parentheses:

z  y if (y > x)
z  x if (y 
2. Right to left definition of block header line:
z function z, y:
   z do_something y
   :
3. Comma seperated values are globbed together into an array.
process_list x, y, 12, 72, "Hello World!"

That's basically it for the syntax, where it gets interesting is namespaces and typing.

Structure

Every code block defines it's own namespace by default. This means that every block of code is explicit in defining what values it is using. Here's a trivial block which takes two operands and produces another:

.z a_function .z, y:
   .z y + .z

The ".", and the lack thereof, are the essential parts when it comes to namespaces. You might have seen the dot syntax in other languages (my_array.append()) and it's not really that different in px, except that referencing variables is a two way street. a_function is declaring it's own namespace under the main program and thus becomes main.a_function. That's fine, everything in the main namespace can access it directly by it's bare name and we can keep stacking them just like in other languages.

But the trick here is the preceding '.' on the variable name z and the lack of it on y. What's actually going on here? What happens is that the function is defined with concrete references to the z in the parent namespace and one reference to a y which does not exist. This becomes a unary function, a partially applied function and is never called until it is supplied with a y. When it does get called with a value, the z in the parent namespace gets updated accordingly. This function might be more appropriately called "increment_z", let's see it in action:

.z 10
increment_z  1
print z             >>11
increment_z  8
print z             >>19

A standard for loop imports all names, but it does so explicitly in it's function definition. "Hold on," you might ask, "it's function definition???" How does a for loop have a function definition? Don't other languages define this as built in to the language? The answer is yes, but in px I have decided to expose all the gory details of language design to the programmer, and only hardcode the bare essentials. This is roughly how the standard loops are implemented as blocks:

.* loop .*, _start, _end:
   break _end condition
   continue _start condition
   :

The _start and _end tags are handles for the final assembly language implementation. The * wildcard states that any and all of the parent namespace will be accessible. This block definition has no pre-determined outputs or effects and so it's considered a function template. If called, the break and continue expressions will only be included if their conditions are met. Here's how that might look:

loop:
   break if .x 

Loop doesn't actually control the looping, the statements within it do. We could have omitted the break statement and used the condition of the continue to control the loop. Notice again that this block is a fully defined namespace and all external references must be used accordingly.

To summarize namespaces, block = namespace = object = method = class = control mechanism = symbol. If a block has fully defined inputs and outputs, it is treated just like any other literal value in the flow.

Types

I've stated that there are two structural elements to my language: Namespaces and Types. Well, I kind of lied: there's only one types are also implemented functionally, but I haven't worked out how and how much of the checking can be done in the final assembly code. The general idea is this: values that aren't known at compile time are put on the heap and indexed with a hash table. Values that are know (by single literal assignment, or explicit typecasting) are put on the stack / data segment. Type statements are just like any unary operator:

x number 225.993
y int 235
z string "Please assign me to z!"

They go right next to the value being assigned as a check for the correctness of the data being passed to it.

No comments:

Post a Comment

Twitter

Labels

Followers

andyvanee.com

Files