Cognition: A new antisyntax language redefining metaprogramming

https://ret2pop.nullring.xyz/blog/cognition.html

1. The problem

Lisp programmers claim that their system of s-expression code in addition to its featureful macro system makes it a metaprogrammable and generalized system. This is of course true, but there's something very broken with lisp: metaprogramming and programming aren't the same thing, meaning there will always be rigid syntax within lisp (its parentheses or the fact that it needs to have characters that tell lisp to read ahead). The left parenthesis tells lisp that it needs to keep on reading until the right parenthesis in order to finish some process that allows it to stop and evaluate the whole expression. This makes the left and right parenthesis unchangable from within the language (not conceptually, but under some implementations it is not possible), and, more importantly, it makes the process of retroactively changing the sequence in which these tokens are delimited impossible, without a heavy amount of string processing. Other langauges have other ways in which they need to read ahead when they see a certain token in order to decide what to do. This process of having a program read ahead based on current input is called syntax.

And as long as you read ahead, or assume a default way of reading ahead, you fall into the trap of having some form of syntax. Cognition is different in that it uses an antisyntax that is fully postfix. This has similarities with concatenative programming languages, but concatenative programming langauges also suffer from two main problems: first, the introduction of the left and right bracket character (which is in fact prefix notation, as it needs to read ahead of the input stream), and the quote character for strings. This is unsuitable for such a general language. You can even see the same problem in lisp's C syntax implementation: escape characters everywhere, awkward must-have spaces delimit the start and end of certain tokens (and if not, it requires post-processing). The racket programming language has its macro system, but it is not runtime dynamic. It still utilizes preprocessing.

So, what's the percise solution to this connundrum? Well, it's beautiful; but it requires some cognition.

2. Introduction

Cognition is an active research project that Matthew Hinton and I have been working on for the past couple of months. Although my commit history for this project has not been impressive, we came up with a lot of the theory together, working alongside each other in order to achieve one of the most generalized systems of syntax we know of. Let's take a look at the conceptual reason why cognition needs to exist, as well as some baremetal cognition code (you'll see what I mean by this later). There's a paper about this language available about the language in the repository, for those interested. Understanding cognition might require a lot of background in parsing, tokenization, and syntax, but I've done my best to write this in a very understandable way. The repository is available at https://github.com/metacrank/cognition, for your information.

coglogo.png

Figure 1: The Cognition programming language, logo designed by Matthew Hinton

3. Baremetal Cognition

Baremetal cognition has a couple of perculiar attributes, and it is remarkably like the Brainfuck programming language. But unlike its look-alike, it has the ability to do some serious metaprogramming. Let's take a look at what the bootstrapping code for a very minimal syntax looks like:

ldfgldftgldfdtgl
df 
dfiff1 crank f

And do note the whitespace (line 2 has a whitespace after df, line 3 has a whitespace, and the newlines matter). Erm, okay. What?

So, our goal in this post is to get from a syntax that looks like that to a syntax that looks like Stem. But how on earth does this piece of code even work? Well, we have to introduce two new ideas: delimiters, and ignores.

3.1. Tokenization

Delimiters allow the tokenizer to figure out when one token ends and another begins. The list of single character tokenizers is public, allowing that list to be modified and read from within cognition itself. Ignored characters are characters that are completely ignored by the tokenizer in the first stage of every read-eval-print loop; that is, at the start of collecting the token, it fist skips a set of ignored characters. By default, every single character is a delimiter, and no characters are ignored characters. The delimiter and ignored characters list allows you to toggle a flag to tell it to blacklist or whitelist the given characters, adding brevity (and practicality) to the language.

Let's take the first line of code as an example:

ldfgldftgldfdtgl

because of the delimiter and ignored rules set by default, every single character is read as a token, and no character is skipped. We therefore read the first character, l. By default, Cognition works off a stack-based programming language design. If you're not familiar, see the Stem blogpost for more detail (in fact if you're not familiar this won't work as an explanation for you, so you should see it, or read up on the Forth programming language). Though, we call them containers, as they are more general than stacks. Additionally, in this default environment, no word is executed except for special faliases, as we will cover later.

Therefore, the character l gets read in and is put on the stack. Then, the character d is read in and put on the stack. But f is different. In order to execute words in Cognition, we must take a look at the falias system.

3.2. Faliases

Faliases are a list of words that get executed when they are put on the stack, or container as we will call it in the future. All of them in fact execute the equivalent of eval in stem but as soon as they are put on their container. Meaning, when f, the default falias, is run, it doesn't go on the container, but rather executes the top of the container which is d. d changes the delimiter list to the string value of a word, meaning that it changes the delimiters to blacklist only the character l as a delimiter. Everything else by default is a delimiter because everything by default is parsed into single character words.

3.3. Delimiter Caveats

Delimiters have an interesting rule, and that is that the delimiter character is excluded from the tokenized word unless we have not ignored a character in the tokenization loop, in which case we collect the character as a part of the current token and keep going. This is in contrast to a third kind of tokenization category called the singlet, which includes itself into a token before skipping itself and ending the tokenization collection.

In addition, remember what I said about the blacklist? Well, you can toggle between blacklisting and whitelisting your list of delimiters, singlets, and ignored characters. By default, there are no blacklisted delimiters, no whitelisted singlets, and no whitelisted ignored characters.

We then also observe that all other characters will simply skip themselves while being collected as a part of the current token, without ending this loop, therefore collecting new characters until the loop halts via delimiter or singlet rules.

3.4. Continuing the Bootstrap Code

So far, we looked at this part of the code:

ldf

which simply creates l as a non-delimiter. Now, for the rest of the code:

gldftgldfdtgl
df 
dfiff1 crank f

gldf puts gl on the stack due to d being a delimiter, and f is called on it, meaning that now g and l are the only non-delimiters. Then, tgl gets put on the stack and they become non-delimiters with df. dtgl gets put on the stack, and the newline becomes the only non-delimiter with \ndf (yes, the newline is actually a part of the code here, and spaces need to be as well in order for this to work). Then, the space character, due to how delimiter rules work (if you don't ignore, the first character is parsed normally even if it is a delimiter) and \n gets put on the stack. Then, another \ \n word is tokenized (you might not see it, but there's another space on line 3). The current stack looks like this (bottom to top):

3. dtgl
2. [space char]\n
1. [space char]\n

df sets the non-delimiters to \ \n. if sets the ignores to \ \n, which ignores these characters at the start of tokenization. f executes dtgl, which is a word that toggles the dflag, the flag that stores the whitelist/blacklist distinction for delimiters. Now, all non-delimiters are delimiters and all delimiters are non-delimiters. Finally, we're put in an environment where spaces and newlines are the delimiters for tokens, and they are ignored at the start of tokenizing a token. Next, 1 is tokenized and put on the stack, and then the crank word, which is then executed by f (the 1 token is treated as a number in this case, but everything textual in cognition is a word). We are done our bootstrapping sequence! Now, you might wonder what crank does. That we will explain in a later section.

4. Bootstrapping Takeaways

From this, we see a couple principles: first, cognition is able to change how it tokenizes on the fly and it can do it programmatically, allowing you to program a program in cognition that would theoretically automate the process of changing these delimiters, singlets, and ignores. This is something impossible in other languages, being able to program your own tokenizer for some foreign language from within cognition, and have future code be tokenized exactly like how you want it to be. This is solely possible because the language is postfix and doesn't read ahead, so it doesn't require more than one token to be parsed before an expression is evaluated. Second, faliases allow us to execute words without having to have prefix words or any default execution of words.

5. Crank

The metacrank system allows us to set a default way in which tokens are executed on the stack. The crank word takes a number as its argument and by effect executes the top of the stack for every n words you put on the stack. To make this concept concrete, let's look at some code (running from what we call crank 1 as we set our environment to crank one at the end of the bootstrapping sequence):

5 crank 2crank 2 crank
1 crank unglue swap quote prepose def

the crank 1 environment allows us to stop using f in order to evaluate tokens. Instead, every 1 token that is tokenized is evaluated. Since we programmed in a newline and space-delimited syntax, we can safely interpret this code intuitively.

The code begins by trying to evaluate 5, which evaluates to itself as it is not a builtin. crank evaluates and puts us in 5 crank, meaning every 5th token evaluates from here on. 2crank, 2, crank, 1 are all put on the stack, leaving us with a stack that looks like so (notice that crank doesn't get executed even though it is a bulitin because we set ourselves to using crank 5):

4. 2crank
3. 2
2. crank
1. 1

crank is the 5th word, so it executes. Note that this puts us back in crank 1, meaning every word is evaluated. unglue is a builtin that gets the value of the word at the top of the stack (as 1 is used up by the crank we evaluated), and so it gets the value of crank, which is a builtin. What that in effect does is it gets the function pointer associated with the crank builtin. Our new stack looks like this:

3. 2crank
2. 2
1. [CLIB]

Where CLIB is our function pointer that points to the crank builtin. We then swap:

3. 2crank
2. [CLIB]
1. 2

then quote, a builtin that quotes the top thing on the stack:

3. 2crank
2. [CLIB]
1. [2]

then prepose, a builtin like compose in stem, except that it preposes and that it puts things in what we call a VMACRO:

2. 2crank
1. ( [2] [CLIB] )

then we call def. This defines a word 2crank that puts 2 on the stack and then calls a function pointer pointing us to the crank builtin. Now, we still have to define what VMACROs are, and in order to do that we might have to explain some differences between the cognition stack and the stem stack.

5.1. Differeneces

In the stem stack, putting words on the stack directly is allowed. In cognition, words are put in containers when they are put on the stack and not evaluated. This means words like compose in stem work on words (or more accurately containers with a single word in them) as well as other containers, making the API for this language more consistent. Additionally, words like cd as we will make use of this concept.

5.1.1. Macros

Macros are another difference between stem quotes and cognition containers. When macros are evaluated, everything in the macro is evaluated, ignoring the crank. If bound to a word, evaluating that word evaluates the macro which will ignore the crank completely and will only increment the cranker by one, while evaluating each statement in the macro. They are useful for making crank-agnostic code, and expanding macros is very useful for the purpose of optimization, although we will actually have to write the word expand from more primitive words later on (hint: it uses recursive unglue).

5.2. More Code

Here is te rest of the code in bootstrap.cog in coglib/:

getd dup _ concat _ swap d i 
_quote_swap_quote_compose_swap_dup_d_i eval 
2crank ing 0 crank spc
2crank ing 1 crank swap quote def
2crank ing 0 crank endl
2crank ing 1 crank swap quote def
2crank ing 1 crank
2crank ing 3 crank load ../coglib/ quote
2crank ing 2 crank swap unglue concat unglue fread unglue evalstr unglue
2crank ing 1 crank compose compose compose compose VMACRO cast def
2crank ing 1 crank
2crank ing 1 crank getargs 1 split swap drop 1 split drop
2crank ing 1 crank
2crank ing 1 crank epop drop
2crank ing 1 crank INDEX spc OUT spc OF spc RANGE
2crank ing 1 crank concat concat concat concat concat concat =
2crank ing 1 crank
2crank ing 1 crank missing spc filename concat concat dup endl concat
2crank ing 1 crank swap quote swap quote compose
2crank ing 2 crank print compose exit compose
2crank ing 1 crank
2crank ing 0 crank fread evalstr
2crank ing 1 crank compose
2crank ing 1 crank
2crank ing 1 crank if

Okay, well, the syntax still doesn't look so good, and it's still pretty hard to get what this is doing. But the basic idea is that 2crank is a macro and is therefore crank agnostic, and we guarantee its execution with ing, another falias (because it's funny). Then, we execute an n crank, which standardizes what crank each line is in (you might wonder what ing and f's interaction is with the cranker. It actually just guarantees the evaluation of the previous thing, so if the previous thing already evaluated f and ing both do nothing). In any case, this defines words that are useful, such as load, which loads something from the coglib. It does this by compose-ing things into quotes and then def-ing those quotes.

The crank, and by extension, the metacrank system is needed in order to discriminate between evaluating some tokens and storing others for metaprogramming without having to use f, while also keeping the system postfix. Crank is just one word that allows for this type of behavior; the more general word, metacrank, allows for much more interesting kinds of syntax manipulation. We have examples of metacrank down the line, but for now I should explain the metacrank word.

5.3. Metacrank

n m metacrank sets a periodic evaluation m for an element n items down the stack. The crank word is therefore equivalent to 0 m metacrank. Only one token can be evaluated per tokenized token, although every metacrank is incremented per token, where lower metacranks get priority. This means that if you set two different metacranks, only one of them can execute per token tokenized, and the lower metacrank gets priority. Note that metacrank and, by extension, crank, don't just depend on tokenized words; they also work while evaluating word definitions recursively, meaning if a word is evaluated in 2 crank, one out of two words will execute in each level of the evaluation tree. You can play around with this in the repl to get a sense of how it works: run ../crank bootstrap.cog repl.cog devel.cog load in the coglib folder, and use stem like syntax in order to define a function. Then, run that function in 2 crank. You will see how the evaluation tree respects cranking in the same way that the program file itself does.

Metacrank allows for not only metaprogramming in the form of code building, but also direct syntax manipulation (i.e. I want to execute this token once I have read n other token(s)). The advantages to this system compared to other programming languages' systems are clear: you can program a prefix word and undef it when you want to rip out that part of syntax. You can write a prefix character that doesn't stop at an ending character but always stops when you read a certain number of tokens. You can feed user input into a math program and feed the output into a syntax system like metacrank. The possibilities are endless! And with that, we will slowly build up the stem programming language, v2, now with macros and from within our own cognition.

6. The Stem Dialect, Improved

In this piece of code, we define the comment:

2crank ing 0 crank ff 1
2crank ing 1 crank cut unaliasf
2crank ing 0 crank 0
2crank ing 1 crank cut swap quote def
2crank ing 0 crank
2crank ing 0 crank #
2crank ing 0 crank geti getd gets crankbase f d f i endl s
2crank ing 1 crank compose compose compose compose compose compose compose compose compose
2crank ing 0 crank drop halt crank s d i
2crank ing 1 crank compose compose compose compose compose VMACRO cast quote compose
2crank ing 0 crank halt 1 quote ing 1 quote ing metacrank
2crank ing 1 crank compose compose compose compose VMACRO cast
2crank ing 1 crank def
2crank ing 2 crank # singlet # delim
2crank ing 1 crank #comment: geti getd gets crankbase '' d '' i '\n' s ( drop halt crank s d i ) halt 1 1 metacrank

and it is our first piece of code that builds something truly prefix. The comment character is a prefix that drops all the text before the newline character, which is a type of word that tells the parser to read ahead. This is our first indication that everything that we thought was possible within cognition truly is.

But before that, we can look at the first couple of lines:

2crank ing 0 crank ff 1
2crank ing 1 crank cut unaliasf
2crank ing 0 crank 0
2crank ing 1 crank cut swap quote def
2crank ing 0 crank

which simply unaliases f from the falias list, with ing being the only remaining falias. In cognition, even these faliases are changeable.

Since we can't put f directly on the stack (if we try by just using f, it would execute), we instead utilize some very minimal string processing to do it, putting ff on the stack and then cutting the string in half to get two copies of f. We then want f to mean false, which in cognition is just an empty word. Therefore, we make an empty word by calling 0 cut on this string, and then def-ing f to the empty string. The following code is where the comment is defined:

2crank ing 0 crank #
2crank ing 0 crank geti getd gets crankbase f d f i endl s
2crank ing 1 crank compose compose compose compose compose compose compose compose compose
2crank ing 0 crank drop halt crank s d i
2crank ing 1 crank compose compose compose compose compose VMACRO cast quote compose
2crank ing 0 crank halt 1 quote ing 1 quote ing metacrank
2crank ing 1 crank compose compose compose compose VMACRO cast
2crank ing 1 crank def
2crank ing 2 crank # singlet # delim
2crank ing 1 crank #comment: geti getd gets crankbase '' d '' i '\n' s ( drop halt crank s d i ) halt 1 1 metacrank

Relevant: halt just puts you in 0 for all metacranks, and VMACRO cast just turns the top thing on the stack from a container to a macro. geti, getd, gets gets the ignores, delims, and singlets respectively as a string; drop is dsc in stem. singlet and delim sets the singlets and delimiters. endl is defined withint bootstrap.cog and just puts the newline character as a word on the stack. crankbase gets the current crank.

we call a lot of compose words in order to build this definition, and we make the # character a singlet delimiter in order to allow for spaces after the comment. We put ourselves in 1 1 metacrank in the # definition while altering the tokenization rules beforehand in order to tokenize everything until a newline as a token while calling # on said word in order to effectively drop that comment and get ourselves back in the original crank and metacrank. Thus, the brilliant # character is written, operating on a token that is tokenized in the future, with complete default postfix syntax. With the information above, one can work out the specifics of how it works; the point is that it does, and one can test that it does by going into the coglib folder and running ../crank bootstrap.cog repl.cog devel.cog load, which will load the REPL and load devel.cog, which will in turn load comment.cog.

6.1. The Great Escape

Here, we accelerate our way out of this primitive syntax, and it all starts with the great escape character. We make many great leaps in this section that aren't entirely explained for the sake of brevity, but you are free to play around with all of these things by using the repl. In any case, I hope you will enjoy this great leap in syntax technology; by the end, we will have reached something with real structure.

Here we define a preliminary prefix escape character. Also you will notice that 2crank ing 0 crank is used as padding between lines:

2crank ing 2 crank comment.cog load
2crank ing 0 crank
2crank ing 1 crank # preliminary escape character \
2crank ing 1 crank \
2crank ing 0 crank halt 1 quote ing crank
2crank ing 1 crank compose compose
2crank ing 2 crank VMACRO cast quote eval
2crank ing 0 crank halt 1 quote ing dup ing metacrank
2crank ing 1 crank compose compose compose compose
2crank ing 2 crank VMACRO cast
2crank ing 1 crank def
2crank ing 0 crank
2crank ing 0 crank

This allows for escaping so that we can put something on the stack even if it is to be evaluated, but we want to redefine this character eventually to be compatible with stem-like quotes. We're even using our comment character in order to annotate this code by now! Here is the full quote definition (once we have this definition, we can use it to improve itself):

2crank ing 0 crank [
2crank ing 0 crank
2crank ing 1 crank # init
2crank ing 0 crank crankbase 1 quote ing metacrankbase dup 1 quote ing =
2crank ing 1 crank compose compose compose compose compose
2crank ing 0 crank
2crank ing 1 crank # meta-crank-stuff0
2crank ing 3 crank dup ] quote =
2crank ing 1 crank compose compose
2crank ing 16 crank drop swap drop swap 1 quote swap metacrank swap crank quote
2crank ing 3 crank compose dup quote dip swap
2crank ing 1 crank compose compose compose compose compose compose compose compose
2crank ing 1 crank compose compose compose compose compose \ VMACRO cast quote compose
2crank ing 3 crank compose dup quote dip swap
2crank ing 1 crank compose compose compose \ VMACRO cast quote compose \ if compose
2crank ing 1 crank \ VMACRO cast quote quote compose
2crank ing 0 crank
2crank ing 1 crank # meta-crank-stuff1
2crank ing 3 crank dup ] quote =
2crank ing 1 crank compose compose
2crank ing 16 crank drop swap drop swap 1 quote swap metacrank swap crank
2crank ing 1 crank compose compose compose compose compose compose compose compose \ VMACRO cast quote compose
2crank ing 3 crank compose dup quote dip swap
2crank ing 1 crank compose compose compose \ VMACRO cast quote compose \ if compose
2crank ing 1 crank \ VMACRO cast quote quote compose
2crank ing 0 crank
2crank ing 1 crank # rest of the definition
2crank ing 16 crank if dup stack swap 0 quote crank
2crank ing 2 crank 1 quote 1 quote metacrank
2crank ing 1 crank compose compose compose compose compose compose compose compose
2crank ing 1 crank compose \ VMACRO cast
2crank ing 0 crank
2crank ing 1 crank def

Um, it's quite the spectacle how Matthew Hinton ever came up with this thing, but alas, it exists. Then, we use it in order to redefine itself, but better as the old quote definition can't do recursive quotes (we can do this because the definition is used before you redefine the word due to postfix def, a development pattern seen often in low level cognition):

\ [
[ crankbase ] [ 1 ] quote compose [ metacrankbase dup ] compose [ 1 ] quote compose [ = ] compose
[ dup ] \ ] quote compose [ = ] compose
[ drop swap drop swap ] [ 1 ] quote compose [ swap metacrank swap crank quote compose ] compose
[ dup ] quote compose [ dip swap ] compose \ VMACRO cast quote compose
[ dup dup dup ] \ [ quote compose [ = swap ] compose \ ( quote compose [ = or swap ] compose \ \ quote compose [ = or ] compose
[ eval ] quote compose
[ compose ] [ dup ] quote compose [ dip swap ] compose \ VMACRO cast quote compose [ if ] compose \ VMACRO cast
quote compose [ if ] compose \ VMACRO cast quote quote
[ dup ] \ ] quote compose [ = ] compose
[ drop swap drop swap ] [ 1 ] quote compose [ swap metacrank swap crank ] compose \ VMACRO cast quote compose
[ dup dup dup ] \ [ quote compose [ = swap ] compose \ ( quote compose [ = or swap ] compose \ \ quote compose [ = or ] compose
[ eval ] quote compose
[ compose ] [ dup ] quote compose [ dip swap ] compose \ VMACRO cast quote compose [ if ] compose \ VMACRO cast
quote compose [ if ] compose \ VMACRO cast quote quote
compose compose [ if dup stack swap ] compose [ 0 ] quote compose [ crank ] compose
[ 1 ] quote dup compose compose [ metacrank ] compose \ VMACRO cast
def

Okay, so now we can use recursive quoting, just like in stem. But there are still a couple things missing that we probably want: a good string quote implementation, and probably escape characters that work in the brackets. Also, since Cognition utilizes macros, we probably want a way to notate those as well, and we probably want a way to expand macros. We can do all of that! First, we will have to redefine \ once more:

\ \
[ [ 1 ] metacrankbase [ 1 ] = ]
[ halt [ 1 ] [ 1 ] metacrank quote compose [ dup ] dip swap ]
\ VMACRO cast quote quote compose
[ halt [ 1 ] crank ] VMACRO cast quote quote compose
[ if halt [ 1 ] [ 1 ] metacrank ] compose \ VMACRO cast
def

This piece of code defines the bracket but for macros (split just splits a list into two):

\ (
\ [ unglue
[ 11 ] split swap [ 10 ] split drop [ macro ] compose
[ 18 ] split quote [ prepose ] compose dip
[ 17 ] split eval eval
[ 1 ] del [ \ ) ] [ 1 ] put
quote quote quote [ prepose ] compose dip
[ 16 ] split eval eval
[ 1 ] del [ \ ) ] [ 1 ] put
quote quote quote [ prepose ] compose dip
prepose
def

We want these macros to automatically expand because it's more efficient to bind already expanded macros to words, and they functionally evaluate identically (isdef just returns a boolean where true is a non-empty string, false is an empty string, if a word is defined):

\ (
( crankbase [ 1 ] metacrankbase dup [ 1 ] =
  [ ( dup \ ) =
      ( drop swap drop swap [ 1 ] swap metacrank swap crank quote compose ( dup ) dip swap )
      ( dup dup dup \ [ = swap \ ( = or swap \ \ = or
        ( eval )
        ( dup isdef ( unglue ) [ ] if compose ( dup ) dip swap )
        if )
      if ) ]
  [ ( dup \ ) =
      ( drop swap drop swap [ 1 ] swap metacrank swap crank )
      ( dup dup dup \ [ = swap \ ( = or swap \ \ = or
        ( eval )
        ( dup isdef ( unglue ) [ ] if compose ( dup ) dip swap )
        if )
      if ) ]
  if dup macro swap
  [ 0 ] crank [ 1 ] [ 1 ] metacrank ) def

and you can see that as we define more things, our language is beginning to look more or less like it has syntax! In this quote.cog file which we have been looking at, there are more things, but the bulk of it is pretty much done. From here on, I will just explain the syntax programmed by quote.cog instead of showing the specific code.

As an example, here is expand:

# define basic expand (works on nonempty macros only)
[ expand ]
( macro swap
  ( [ 1 ] split
    ( isword ( dup isdef ( unglue ) ( ) if ) ( ) if compose ) dip
    size [ 0 ] > ( ( ( dup ) dip swap ) dip swap eval ) ( ) if )
  dup ( swap ( swap ) dip ) dip eval drop swap drop ) def
# complete expand (checks for definitions within child first without copying hashtables)
[ expand ]
( size [ 0 ] > ( type [ VSTACK ] = ) ( return ) if ?
  ( macro swap
    macro
    ( ( ( size dup [ 0 ] > ) dip swap ) dip swap
      ( ( ( 1 - dup ( vat ) dip swap ( del ) dip ) dip compose ) dip dup eval )
      ( drop swap drop )
      if ) dup eval
    ( ( [ 1 ] split
        ( isword
          ( compose cd dup isdef
            ( unglue pop )
              ( pop dup isdef ( unglue ) ( ) if )
            if ) ( ) if
          ( swap ) dip compose swap ) dip
        size [ 0 ] > ) dip swap
      ( dup eval ) ( drop drop swap compose ) if ) dup eval )
  ( expand )
  if ) def

Which recursively expands word definitions inside a quote or macro, using the word unglue. We've used the expand word in order to redefine itself in a more general case.

7. The Brainfuck Dialect

And returning to whence we came, we define the Brainfuck dialect with our current advanced stem dialect:

comment.cog load
quote.cog load
[ ] [ ] [ 0 ]
[ > ] [[ swap [[ compose ]] dip size [ 0 ] = [ [ 0 ] ] [[ [ 1 ] split swap ]] if ]] def
[ < ] [[ prepose [[ size dup [ 0 ] = [ ] [[ [ 1 ] - split ]] if ]] dip swap ]] def
[ + ] [[ [ 1 ] + ]] def
[ - ] [[ [ 1 ] - ]] def
[ . ] [[ dup char print ]] def
[ , ] [[ drop read byte ]] def
[ pick ] ( ( ( dup ) dip swap ) dip swap ) def
[ exec ] ( ( [ 1 ] * dup ) dip swap [ 0 ] = ( drop ) ( dup ( evalstr ) dip \ exec ) if ) def
\ [ (
  ( dup [ \ ] ] =
    ( drop swap - [ 1 ] * dup [ 0 ] =
      ( drop swap drop halt [ 1 ] crank exec )
      ( swap [ \ ] ] concat pick )
      if )
    ( dup [ \ [ ] =
      ( concat swap + swap pick )
      ( concat pick )
      if )
    if )
  dup [ 1 ] swap f swap halt [ 1 ] [ 1 ] metacrank
) def
><+-,.[] dup ( i s itgl f d ) eval

test with ../crank -s 2 bootstrap.cog helloworld.bf brainfuck.cog. You may of course load your favorite brainfuck file with this method. Note that brainfuck.cog isn't a brainfuck parser in the ordinary sense; it actually defines brainfuck words and tokenizes brainfuck, running it in the native cognition environment.

It's very profound, as well, how our current syntax allows us to define an alternate syntax with great ease. It might make you wonder if it's possible to specifically craft a syntax whose job is to write other syntaxes. Another interesting observation you might have is that Cognition defines syntax by defining a prefix character as a word that uses metacrank, rather than reading symbols and deciding what to do based on symbols. It's almost as if the syntax becomes inherent to the word that's being defined.

These two ideas synthesize to create something truly exciting, but that hasn't yet been implemented in the standard library (though we very much know that it is possible). Introducing: the dialect dialect of Cognition…

7.1. The Dialect Dialect

Imagine a word mkprefix, that takes two input words (say for example [ and ]), and an operation, and automatically defines [ to apply said operation until it hits a ] character. This is possible because constructs like metacrank and def are all just regular words, so it's possible to use them as words to metaprogram with. In fact, everything is just a word (even d, i, and s), so you can imagine a hyperabstract dialect that includes words like mkprefix, using syntax to automate the process of implementing more syntax. Such a construct I have not encountered in any other programming language. Yet, in your own Cognition, you can make nearly anything a reality.

Such creative things Matthew Hinton and I have discussed as possibilities regarding the standard library. Right now, the standard library has metawords that generate abstract words automatically and call them. This is possible through string concatenation and using def in the definition of another word also (this is also possible in my prior programming language Stem). We have discussed the possibility of a word that searches for word-generators to abstract its current wordlist automatically, and we have talked about the possibility of directing this abstraction framework for the purpose of solving a problem. These are conceptually possible words to write within cognition, and this might give you an idea of how powerful this idea is.

8. Theoretical Musings

There are a couple of things about Cognition that make it interesting beyond its quirks. For instance, string processing in this language is equivalent to tokenizer postprocessing, which makes string operations inherently extremely powerful in this language. It also has potential applications in Symbolic AI and in syntax and grammar research, where prototypes of languages and metalanguages can be tested with ease. I'd imagine that anyone configuring a program that reads a configuration file would really want their configuration language to be something like this, where they can have full freedom over the syntax (and metasyntax) in which they program in (think about a Cognition based shell, or a Cognition based operating system!). Though, the point of working on this language was never its applications; its intrinsic beauty is its own philosophical statement.

9. Conclusion

You can imagine cognition can program basically any syntax you would want, and in this article, we demonstrate the power of the already existing code that makes cognition work. In short, the system allows for true syntax as code, as my friend Andrei put it; one can dynamically program and even automate the production of syntax. In this article, we didn't have the space to cover other important Cognition concepts like the Metastack and words like cd, but this can be done in a part 2 of this blog post. For now, let's leave off here, and we can meet here once more for a part two.

{
"by": "GalaxyNova",
"descendants": 135,
"id": 40231563,
"kids": [
40232683,
40233754,
40232451,
40232563,
40234018,
40233965,
40232625,
40232353,
40233275,
40233053,
40232578,
40232393,
40233887,
40233377,
40233118,
40232698,
40233222,
40232504,
40232231,
40233609,
40233345,
40233626,
40232369,
40232928,
40238320,
40233102,
40232925,
40237099,
40232894,
40232282,
40233770,
40237015,
40236589,
40239969,
40239760,
40232719,
40237605,
40232443,
40234655,
40236358,
40234214,
40237515,
40236103,
40232812,
40234370,
40232771,
40238322,
40232881,
40232390,
40232704,
40232658,
40235842
],
"score": 300,
"time": 1714611442,
"title": "Cognition: A new antisyntax language redefining metaprogramming",
"type": "story",
"url": "https://ret2pop.nullring.xyz/blog/cognition.html"
}
{
"author": "Preston Pan",
"date": null,
"description": "Other languages are inflexible and broken. Let’s fix that.",
"image": "https://ret2pop.nullring.xyz/blog/img/coglogo.png",
"logo": null,
"publisher": null,
"title": "Cognition",
"url": "https://ret2pop.nullring.xyz/blog/cognition.html"
}
{
"url": "https://ret2pop.nullring.xyz/blog/cognition.html",
"title": "Cognition",
"description": "1. The problem Lisp programmers claim that their system of s-expression code in addition to its featureful macro system makes it a metaprogrammable and generalized system. This is of course true, but...",
"links": [
"https://ret2pop.nullring.xyz/blog/cognition.html"
],
"image": "",
"content": "<div>\n<h2 id=\"org7a309fe\"><span>1.</span> The problem</h2>\n<div>\n<p>\nLisp programmers claim that their system of s-expression code in addition to its featureful macro system makes it a\nmetaprogrammable and generalized system. This is of course true, but there's something very broken with lisp: metaprogramming\nand programming <i>aren't the same thing</i>, meaning there will always be rigid syntax within lisp\n(its parentheses or the fact that it needs to have characters that tell lisp to <i>read ahead</i>). The left parenthesis tells\nlisp that it needs to keep on reading until the right parenthesis in order to finish some process that allows it to stop\nand evaluate the whole expression. This makes the left and right parenthesis unchangable from within the language (not\nconceptually, but under some implementations it is not possible), and, more importantly, it makes the process of retroactively\nchanging the sequence in which these tokens are delimited <i>impossible</i>, without a heavy amount of string processing. Other\nlangauges have other ways in which they need to read ahead when they see a certain token in order to decide what to do.\nThis process of having a program read ahead based on current input is called <i>syntax</i>.\n</p>\n<p>\nAnd as long as you read ahead, or assume a default way of reading ahead, you fall into the trap of having some form of syntax.\nCognition is different in that it uses an antisyntax that is fully <i>postfix</i>. This has similarities with concatenative\nprogramming languages, but concatenative programming langauges also suffer from two main problems: first, the introduction\nof the left and right bracket character (which is in fact prefix notation, as it needs to read ahead of the input stream),\nand the quote character for strings. This is unsuitable for such a general language. You can even see the same problem\nin lisp's C syntax implementation: escape characters everywhere, awkward must-have spaces delimit the start and end\nof certain tokens (and if not, it requires post-processing). The racket programming language has its macro system,\nbut it is not <i>runtime dynamic</i>. It still utilizes preprocessing.\n</p>\n<p>\nSo, what's the percise solution to this connundrum? Well, it's beautiful; but it requires some <i>cognition</i>.\n</p>\n</div>\n</div><div>\n<h2 id=\"org0235569\"><span>2.</span> Introduction</h2>\n<div>\n<p>\nCognition is an active research project that Matthew Hinton and I have been working on for the past\ncouple of months. Although my commit history for <a target=\"_blank\" href=\"https://github.com/metacrank/cognition\">this project</a> has not been impressive, we came up with\na lot of the theory together, working alongside each other in order to achieve one of the most generalized\nsystems of syntax we know of. Let's take a look at the conceptual reason why cognition needs to exist, as\nwell as some <i>baremetal cognition</i> code (you'll see what I mean by this later). There's a paper about this language\navailable about the language in the repository, for those interested. Understanding cognition might require a\nlot of background in parsing, tokenization, and syntax, but I've done my best to write this in a very understandable way.\nThe repository is available at <a target=\"_blank\" href=\"https://github.com/metacrank/cognition\">https://github.com/metacrank/cognition</a>, for your information.\n</p>\n<div>\n<p><img src=\"https://ret2pop.nullring.xyz/blog/img/coglogo.png\" alt=\"coglogo.png\" />\n</p>\n<p><span>Figure 1: </span>The Cognition programming language, logo designed by Matthew Hinton</p>\n</div>\n</div>\n</div><div>\n<h2 id=\"orgc03fde5\"><span>3.</span> Baremetal Cognition</h2>\n<div>\n<p>\nBaremetal cognition has a couple of perculiar attributes, and it is remarkably like the <i>Brainfuck</i> programming language.\nBut unlike its look-alike, it has the ability to do some <i>serious metaprogramming</i>. Let's take a look at what the\nbootstrapping code for a <i>very minimal</i> syntax looks like:\n</p>\n<pre>\nldfgldftgldfdtgl\ndf \ndfiff1 crank f\n</pre>\n<p>\nAnd <b>do</b> note the whitespace (line 2 has a whitespace after df, line 3 has a whitespace, and the newlines matter).\nErm, okay. What?\n</p>\n<p>\nSo, our goal in this post is to get from a syntax that looks like <i>that</i> to a syntax that looks like <a target=\"_blank\" href=\"https://ret2pop.nullring.xyz/blog/stem.html\">Stem</a>.\nBut how on earth does this piece of code even work? Well, we have to introduce two new ideas: delimiters, and ignores.\n</p>\n</div>\n<div>\n<h3 id=\"org8447c55\"><span>3.1.</span> Tokenization</h3>\n<div>\n<p>\nDelimiters allow the tokenizer to figure out when one token ends and another begins. The list of single character tokenizers\nis public, allowing that list to be modified and read from within cognition itself. Ignored characters are characters\nthat are completely ignored by the tokenizer in the first stage of every read-eval-print loop; that is, at the start of\ncollecting the token, it fist skips a set of ignored characters. By default, every single character is a delimiter, and\nno characters are ignored characters. The delimiter and ignored characters list allows you to toggle a flag to tell it\nto blacklist or whitelist the given characters, adding brevity (and practicality) to the language.\n</p>\n<p>\nLet's take the first line of code as an example:\n</p>\n<pre>\nldfgldftgldfdtgl\n</pre>\n<p>\nbecause of the delimiter and ignored rules set by default, every single character is read as a token, and no character\nis skipped. We therefore read the first character, <code>l</code>. By default, Cognition works off a stack-based programming language\ndesign. If you're not familiar, see the <a target=\"_blank\" href=\"https://ret2pop.nullring.xyz/blog/stem.html\">Stem blogpost</a> for more detail (in fact if you're not familiar this <i>won't work</i>\nas an explanation for you, so you should see it, or read up on the <i>Forth</i> programming language).\nThough, we call them <i>containers</i>, as they are more general than stacks. Additionally, in this default environment, <i>no</i>\nword is executed except for special <i>faliases</i>, as we will cover later.\n</p>\n<p>\nTherefore, the character <code>l</code> gets read in and is put on the stack. Then, the character <code>d</code> is read in and put on the stack.\nBut <code>f</code> is different. In order to execute words in Cognition, we must take a look at the falias system.\n</p>\n</div>\n</div>\n<div>\n<h3 id=\"org582a989\"><span>3.2.</span> Faliases</h3>\n<p>\nFaliases are a list of words that get executed when they are put on the stack, or container as we will call it in the future.\nAll of them in fact execute the equivalent of <code>eval</code> in stem but as soon as they are put on their container. Meaning, when\n<code>f</code>, the default falias, is run, it doesn't go on the container, but rather executes the top of the container which is <code>d</code>.\n<code>d</code> changes the delimiter list to the string value of a word, meaning that it changes the delimiters to <i>blacklist</i> only\nthe character <code>l</code> as a delimiter. Everything else by default is a delimiter because everything by default is parsed\ninto single character words.\n</p>\n</div>\n<div>\n<h3 id=\"orgc25ab41\"><span>3.3.</span> Delimiter Caveats</h3>\n<div>\n<p>\nDelimiters have an interesting rule, and that is that the delimiter character is excluded from the tokenized word\nunless we have not ignored a character in the tokenization loop, in which case we collect the character as a part of\nthe current token and keep going. This is in contrast to a third kind of tokenization category called the singlet, which\n<i>includes</i> itself into a token before skipping itself and ending the tokenization collection.\n</p>\n<p>\nIn addition, remember what I said about the <i>blacklist</i>? Well, you can toggle between <i>blacklisting</i> and <i>whitelisting</i>\nyour list of delimiters, singlets, and ignored characters. By default, there are no <i>blacklisted</i> delimiters, no\n<i>whitelisted</i> singlets, and no <i>whitelisted</i> ignored characters.\n</p>\n<p>\nWe then also observe that all other characters will simply skip themselves while being collected as a part of the current\ntoken, without ending this loop, therefore collecting new characters until the loop halts via delimiter or singlet rules.\n</p>\n</div>\n</div>\n<div>\n<h3 id=\"org6531c4b\"><span>3.4.</span> Continuing the Bootstrap Code</h3>\n<div>\n<p>\nSo far, we looked at this part of the code:\n</p>\n<pre>\nldf\n</pre>\n<p>\nwhich simply creates <code>l</code> as a non-delimiter. Now, for the rest of the code:\n</p>\n<pre>\ngldftgldfdtgl\ndf \ndfiff1 crank f\n</pre>\n<p>\n<code>gldf</code> puts <code>gl</code> on the stack due to <code>d</code> being a delimiter, and <code>f</code> is called on it, meaning that now <code>g</code> and <code>l</code> are\nthe only non-delimiters. Then, <code>tgl</code> gets put on the stack and they become non-delimiters with <code>df</code>. <code>dtgl</code> gets\nput on the stack, and the newline becomes the only non-delimiter with <code>\\ndf</code> (yes, the newline is actually a part of\nthe code here, and spaces need to be as well in order for this to work). Then, the space character, due to how delimiter\nrules work (if you don't ignore, the first character is parsed normally even if it is a delimiter)\nand <code>\\n</code> gets put on the stack. Then, another <code>\\ \\n</code> word is tokenized (you might not see it, but there's another\nspace on line 3). The current stack looks like this (bottom to top):\n</p>\n<pre>\n3. dtgl\n2. [space char]\\n\n1. [space char]\\n\n</pre>\n<p>\n<code>df</code> sets the non-delimiters to <code>\\ \\n</code>. <code>if</code> sets the ignores to <code>\\ \\n</code>, which ignores these characters at the start\nof tokenization. <code>f</code> executes <code>dtgl</code>, which is a word that toggles the <i>dflag</i>, the flag that stores the whitelist/blacklist\ndistinction for delimiters. Now, all non-delimiters are delimiters and all delimiters are non-delimiters.\nFinally, we're put in an environment where spaces and newlines are the delimiters for tokens, and they are ignored at the\nstart of tokenizing a token. Next, <code>1</code> is tokenized and put on the stack, and then the <code>crank</code> word, which is then executed\nby <code>f</code> (the <code>1</code> token is treated as a number in this case, but everything textual in cognition is a word).\nWe are done our bootstrapping sequence! Now, you might wonder what <code>crank</code> does. That we will explain in a later section.\n</p>\n</div>\n</div>\n</div><div>\n<h2 id=\"org6f57bd7\"><span>4.</span> Bootstrapping Takeaways</h2>\n<p>\nFrom this, we see a couple principles: first, cognition is able to change how it tokenizes on the fly and it can do it\nprogrammatically, allowing you to program a program in cognition that would theoretically automate the process of changing\nthese delimiters, singlets, and ignores. This is something impossible in other languages, being able to\n<i>program your own tokenizer for some foreign language from within cognition</i>, and have\n<i>future code be tokenized exactly like how you want it to be</i>. This is solely possible because the language is postfix\nand doesn't read ahead, so it doesn't require more than one token to be parsed before an expression is evaluated. Second,\nfaliases allow us to execute words without having to have prefix words or any default execution of words.\n</p>\n</div><div>\n<h2 id=\"org3a0a882\"><span>5.</span> Crank</h2>\n<div>\n<p>\nThe <i>metacrank</i> system allows us to set a default way in which tokens are executed on the stack. The <code>crank</code> word takes\na number as its argument and by effect executes the top of the stack for every <code>n</code> words you put on the stack. To make\nthis concept concrete, let's look at some code (running from what we call <i>crank 1</i> as we set our environment to\ncrank one at the end of the bootstrapping sequence):\n</p>\n<pre>\n5 crank 2crank 2 crank\n1 crank unglue swap quote prepose def\n</pre>\n<p>\nthe crank 1 environment allows us to stop using <code>f</code> in order to evaluate tokens. Instead, every <i>1</i> token that is\ntokenized is evaluated. Since we programmed in a newline and space-delimited syntax, we can safely interpret this code\nintuitively.\n</p>\n<p>\nThe code begins by trying to evaluate <code>5</code>, which evaluates to itself as it is not a builtin. <code>crank</code> evaluates and puts\nus in 5 crank, meaning every <i>5th</i> token evaluates from here on. <code>2crank</code>, <code>2</code>, <code>crank</code>, <code>1</code> are all put on the stack,\nleaving us with a stack that looks like so (notice that <code>crank</code> doesn't get executed even though it is a bulitin because\nwe set ourselves to using crank 5):\n</p>\n<pre>\n4. 2crank\n3. 2\n2. crank\n1. 1\n</pre>\n<p>\n<code>crank</code> is the 5th word, so it executes. Note that this puts us back in crank 1, meaning every word is evaluated.\n<code>unglue</code> is a builtin that gets the value of the word at the top of the stack (as <code>1</code> is used up by the <code>crank</code> we\nevaluated), and so it gets the value of <code>crank</code>, which is a builtin. What that in effect does is it gets the function\npointer associated with the crank builtin. Our new stack looks like this:\n</p>\n<pre>\n3. 2crank\n2. 2\n1. [CLIB]\n</pre>\n<p>\nWhere CLIB is our function pointer that points to the <code>crank</code> builtin. We then <code>swap</code>:\n</p>\n<pre>\n3. 2crank\n2. [CLIB]\n1. 2\n</pre>\n<p>\nthen <code>quote</code>, a builtin that quotes the top thing on the stack:\n</p>\n<pre>\n3. 2crank\n2. [CLIB]\n1. [2]\n</pre>\n<p>\nthen prepose, a builtin like <code>compose</code> in stem, except that it preposes and that it puts things in what we call a VMACRO:\n</p>\n<pre>\n2. 2crank\n1. ( [2] [CLIB] )\n</pre>\n<p>\nthen we call <code>def</code>. This defines a word <code>2crank</code> that puts <code>2</code> on the stack and then calls a function pointer pointing\nus to the crank builtin. Now, we still have to define what VMACROs are, and in order to do that we might have to explain\nsome differences between the cognition stack and the stem stack.\n</p>\n</div>\n<div>\n<h3 id=\"org470c986\"><span>5.1.</span> Differeneces</h3>\n<p>\nIn the stem stack, putting words on the stack directly is allowed. In cognition, words are put in containers when\nthey are put on the stack and not evaluated. This means words like <code>compose</code> in stem work on words (or more accurately\ncontainers with a single word in them) as well as other containers, making the API for this language more consistent.\nAdditionally, words like <code>cd</code> as we will make use of this concept.\n</p>\n<div>\n<h4 id=\"org2bcc158\"><span>5.1.1.</span> Macros</h4>\n<p>\nMacros are another difference between stem quotes and cognition containers. When macros are evaluated, everything in\nthe macro is evaluated, ignoring the crank. If bound to a word, evaluating that word evaluates the macro which will ignore\nthe crank completely and will only increment the cranker by one, while evaluating each statement in the macro. They\nare useful for making crank-agnostic code, and expanding macros is very useful for the purpose of optimization, although\nwe will actually have to write the word <code>expand</code> from more primitive words later on (hint: it uses recursive <code>unglue</code>).\n</p>\n</div>\n</div>\n<div>\n<h3 id=\"org4cae568\"><span>5.2.</span> More Code</h3>\n<div>\n<p>\nHere is te rest of the code in <code>bootstrap.cog</code> in <code>coglib/</code>:\n</p>\n<pre>\ngetd dup _ concat _ swap d i \n_quote_swap_quote_compose_swap_dup_d_i eval \n2crank ing 0 crank spc\n2crank ing 1 crank swap quote def\n2crank ing 0 crank endl\n2crank ing 1 crank swap quote def\n2crank ing 1 crank\n2crank ing 3 crank load ../coglib/ quote\n2crank ing 2 crank swap unglue concat unglue fread unglue evalstr unglue\n2crank ing 1 crank compose compose compose compose VMACRO cast def\n2crank ing 1 crank\n2crank ing 1 crank getargs 1 split swap drop 1 split drop\n2crank ing 1 crank\n2crank ing 1 crank epop drop\n2crank ing 1 crank INDEX spc OUT spc OF spc RANGE\n2crank ing 1 crank concat concat concat concat concat concat =\n2crank ing 1 crank\n2crank ing 1 crank missing spc filename concat concat dup endl concat\n2crank ing 1 crank swap quote swap quote compose\n2crank ing 2 crank print compose exit compose\n2crank ing 1 crank\n2crank ing 0 crank fread evalstr\n2crank ing 1 crank compose\n2crank ing 1 crank\n2crank ing 1 crank if\n</pre>\n<p>\nOkay, well, the syntax still doesn't look so good, and it's still pretty hard to get what this is doing. But the\nbasic idea is that <code>2crank</code> is a macro and is therefore crank agnostic, and we guarantee its execution with <code>ing</code>, another\nfalias (because it's funny). Then, we execute an <code>n crank</code>, which standardizes what crank each line is in (you might\nwonder what <code>ing</code> and <code>f</code>'s interaction is with the cranker. It actually just guarantees the evaluation of the previous\nthing, so if the previous thing already evaluated <code>f</code> and <code>ing</code> both do nothing). In any case, this defines words that\nare useful, such as <code>load</code>, which loads something from the coglib. It does this by <code>compose</code>-ing things into quotes and\nthen <code>def</code>-ing those quotes.\n</p>\n<p>\nThe crank, and by extension, the metacrank system is needed in order to discriminate between <i>evaluating</i> some tokens\nand <i>storing</i> others for metaprogramming without having to use <code>f</code>, while also keeping the system postfix. Crank\nis just one word that allows for this type of behavior; the more general word, <code>metacrank</code>, allows for much more\ninteresting kinds of syntax manipulation. We have examples of <code>metacrank</code> down the line, but for now I should explain\nthe <i>metacrank word</i>.\n</p>\n</div>\n</div>\n<div>\n<h3 id=\"org6cb40f0\"><span>5.3.</span> Metacrank</h3>\n<div>\n<p>\n<code>n m metacrank</code> sets a periodic evaluation <code>m</code> for an element <code>n</code> items down the stack. The <code>crank</code> word is therefore\nequivalent to <code>0 m metacrank</code>. Only one token can be evaluated per tokenized token, although <i>every</i> metacrank is incremented\nper token, where lower metacranks get priority. This means that if you set two different metacranks, only <i>one</i> of them\ncan execute per token tokenized, and the lower metacrank gets priority. Note that metacrank and, by extension, crank,\ndon't <i>just</i> depend on tokenized words; they also work while evaluating word definitions recursively, meaning if a word\nis evaluated in <code>2 crank</code>, one out of two words will execute in each level of the evaluation tree. You can play around\nwith this in the repl to get a sense of how it works: run <code>../crank bootstrap.cog repl.cog devel.cog load</code>\nin the coglib folder, and use stem like syntax in order to define a function. Then, run that function in <code>2 crank</code>.\nYou will see how the evaluation tree respects cranking in the same way that the program file itself does.\n</p>\n<p>\nMetacrank allows for not only metaprogramming in the form of code building, but also\ndirect syntax manipulation (i.e. <i>I want to execute this token once I have read n other token(s)</i>). The advantages to\nthis system compared to other programming languages' systems are clear: you can program a prefix word and <code>undef</code> it\nwhen you want to rip out that part of syntax. You can write a prefix character that doesn't stop at an ending character\nbut <i>always</i> stops when you read a certain number of tokens. You can feed user input into a math program and feed the\noutput into a syntax system like metacrank. The possibilities are endless! And with that, we will slowly build up the\n<code>stem</code> programming language, v2, now with macros and from within our own <i>cognition</i>.\n</p>\n</div>\n</div>\n</div><div>\n<h2 id=\"orga5bcdec\"><span>6.</span> The Stem Dialect, Improved</h2>\n<div>\n<p>\nIn this piece of code, we define the <i>comment</i>:\n</p>\n<pre>\n2crank ing 0 crank ff 1\n2crank ing 1 crank cut unaliasf\n2crank ing 0 crank 0\n2crank ing 1 crank cut swap quote def\n2crank ing 0 crank\n2crank ing 0 crank #\n2crank ing 0 crank geti getd gets crankbase f d f i endl s\n2crank ing 1 crank compose compose compose compose compose compose compose compose compose\n2crank ing 0 crank drop halt crank s d i\n2crank ing 1 crank compose compose compose compose compose VMACRO cast quote compose\n2crank ing 0 crank halt 1 quote ing 1 quote ing metacrank\n2crank ing 1 crank compose compose compose compose VMACRO cast\n2crank ing 1 crank def\n2crank ing 2 crank # singlet # delim\n2crank ing 1 crank #comment: geti getd gets crankbase '' d '' i '\\n' s ( drop halt crank s d i ) halt 1 1 metacrank\n</pre>\n<p>\nand it is our first piece of code that builds something <i>truly</i> prefix. The comment character is a prefix that drops\nall the text before the newline character, which is a type of word that tells the parser to <i>read ahead</i>. This is our\nfirst indication that everything that we thought was possible within cognition truly <i>is</i>.\n</p>\n<p>\nBut before that, we can look at the first couple of lines:\n</p>\n<pre>\n2crank ing 0 crank ff 1\n2crank ing 1 crank cut unaliasf\n2crank ing 0 crank 0\n2crank ing 1 crank cut swap quote def\n2crank ing 0 crank\n</pre>\n<p>\nwhich simply unaliases <code>f</code> from the falias list, with <code>ing</code> being the only remaining falias. In cognition, even these\nfaliases are changeable.\n</p>\n<p>\nSince we can't put <code>f</code> directly on the stack (if we try by just using <code>f</code>, it would execute), we instead utilize some\nvery minimal string processing to do it, putting <code>ff</code> on the stack and then cutting the string in half to get two copies\nof <code>f</code>. We then want <code>f</code> to mean false, which in cognition is just an empty word. Therefore, we make an empty word by\ncalling <code>0 cut</code> on this string, and then <code>def</code>-ing f to the empty string. The following code is where the comment is\ndefined:\n</p>\n<pre>\n2crank ing 0 crank #\n2crank ing 0 crank geti getd gets crankbase f d f i endl s\n2crank ing 1 crank compose compose compose compose compose compose compose compose compose\n2crank ing 0 crank drop halt crank s d i\n2crank ing 1 crank compose compose compose compose compose VMACRO cast quote compose\n2crank ing 0 crank halt 1 quote ing 1 quote ing metacrank\n2crank ing 1 crank compose compose compose compose VMACRO cast\n2crank ing 1 crank def\n2crank ing 2 crank # singlet # delim\n2crank ing 1 crank #comment: geti getd gets crankbase '' d '' i '\\n' s ( drop halt crank s d i ) halt 1 1 metacrank\n</pre>\n<p>\nRelevant: <code>halt</code> just puts you in 0 for all metacranks, and <code>VMACRO cast</code> just turns the top thing on the stack from a\ncontainer to a macro. <code>geti</code>, <code>getd</code>, <code>gets</code> gets the ignores, delims, and singlets respectively as a string; <code>drop</code> is\n<code>dsc</code> in stem. <code>singlet</code> and <code>delim</code> sets the singlets and delimiters. <code>endl</code> is defined withint <code>bootstrap.cog</code> and just\nputs the newline character as a word on the stack. <code>crankbase</code> gets the current crank.\n</p>\n<p>\nwe call a lot of <code>compose</code> words in order to build this definition, and we make the <code>#</code> character a singlet delimiter in\norder to allow for spaces after the comment. We put ourselves in <code>1 1 metacrank</code> in the <code>#</code> definition while altering\nthe tokenization rules beforehand in order to tokenize everything until a newline as a token while calling <code>#</code> on said word\nin order to effectively drop that comment and get ourselves back in the original crank and metacrank. Thus, the brilliant\n<code>#</code> character is written, operating on a token that is tokenized <i>in the future</i>, with complete default postfix syntax.\nWith the information above, one can work out the specifics of how it works; the point is that it <i>does</i>, and one can test\nthat it does by going into the <code>coglib</code> folder and running <code>../crank bootstrap.cog repl.cog devel.cog load</code>, which will load\nthe REPL and load <code>devel.cog</code>, which will in turn load <code>comment.cog</code>.\n</p>\n</div>\n<div>\n<h3 id=\"org38387f1\"><span>6.1.</span> The Great Escape</h3>\n<div>\n<p>\nHere, we accelerate our way out of this primitive syntax, and it all starts with the great escape character. We make\nmany great leaps in this section that aren't entirely explained for the sake of brevity, but you are free to play around\nwith all of these things by using the repl. In any case, I hope you will enjoy this great leap in syntax technology; by\nthe end, we will have reached something with real <i>structure</i>.\n</p>\n<p>\nHere we define a preliminary prefix escape character. Also you will notice that <code>2crank ing 0 crank</code> is used as\npadding between lines:\n</p>\n<pre>\n2crank ing 2 crank comment.cog load\n2crank ing 0 crank\n2crank ing 1 crank # preliminary escape character \\\n2crank ing 1 crank \\\n2crank ing 0 crank halt 1 quote ing crank\n2crank ing 1 crank compose compose\n2crank ing 2 crank VMACRO cast quote eval\n2crank ing 0 crank halt 1 quote ing dup ing metacrank\n2crank ing 1 crank compose compose compose compose\n2crank ing 2 crank VMACRO cast\n2crank ing 1 crank def\n2crank ing 0 crank\n2crank ing 0 crank\n</pre>\n<p>\nThis allows for escaping so that we can put something on the stack even if it is to be evaluated,\nbut we want to redefine this character eventually to be compatible with stem-like quotes. We're\neven using our comment character in order to annotate this code by now! Here is the full quote definition (once we have\nthis definition, we can use it to improve itself):\n</p>\n<pre>\n2crank ing 0 crank [\n2crank ing 0 crank\n2crank ing 1 crank # init\n2crank ing 0 crank crankbase 1 quote ing metacrankbase dup 1 quote ing =\n2crank ing 1 crank compose compose compose compose compose\n2crank ing 0 crank\n2crank ing 1 crank # meta-crank-stuff0\n2crank ing 3 crank dup ] quote =\n2crank ing 1 crank compose compose\n2crank ing 16 crank drop swap drop swap 1 quote swap metacrank swap crank quote\n2crank ing 3 crank compose dup quote dip swap\n2crank ing 1 crank compose compose compose compose compose compose compose compose\n2crank ing 1 crank compose compose compose compose compose \\ VMACRO cast quote compose\n2crank ing 3 crank compose dup quote dip swap\n2crank ing 1 crank compose compose compose \\ VMACRO cast quote compose \\ if compose\n2crank ing 1 crank \\ VMACRO cast quote quote compose\n2crank ing 0 crank\n2crank ing 1 crank # meta-crank-stuff1\n2crank ing 3 crank dup ] quote =\n2crank ing 1 crank compose compose\n2crank ing 16 crank drop swap drop swap 1 quote swap metacrank swap crank\n2crank ing 1 crank compose compose compose compose compose compose compose compose \\ VMACRO cast quote compose\n2crank ing 3 crank compose dup quote dip swap\n2crank ing 1 crank compose compose compose \\ VMACRO cast quote compose \\ if compose\n2crank ing 1 crank \\ VMACRO cast quote quote compose\n2crank ing 0 crank\n2crank ing 1 crank # rest of the definition\n2crank ing 16 crank if dup stack swap 0 quote crank\n2crank ing 2 crank 1 quote 1 quote metacrank\n2crank ing 1 crank compose compose compose compose compose compose compose compose\n2crank ing 1 crank compose \\ VMACRO cast\n2crank ing 0 crank\n2crank ing 1 crank def\n</pre>\n<p>\nUm, it's quite the spectacle how Matthew Hinton ever came up with this thing, but alas, it exists. Then, we use it in\norder to redefine itself, but better as the old quote definition can't do recursive quotes\n(we can do this because the definition is <i>used</i> before you redefine the word due to postfix <code>def</code>, a\ndevelopment pattern seen often in low level cognition):\n</p>\n<pre>\n\\ [\n[ crankbase ] [ 1 ] quote compose [ metacrankbase dup ] compose [ 1 ] quote compose [ = ] compose\n[ dup ] \\ ] quote compose [ = ] compose\n[ drop swap drop swap ] [ 1 ] quote compose [ swap metacrank swap crank quote compose ] compose\n[ dup ] quote compose [ dip swap ] compose \\ VMACRO cast quote compose\n[ dup dup dup ] \\ [ quote compose [ = swap ] compose \\ ( quote compose [ = or swap ] compose \\ \\ quote compose [ = or ] compose\n[ eval ] quote compose\n[ compose ] [ dup ] quote compose [ dip swap ] compose \\ VMACRO cast quote compose [ if ] compose \\ VMACRO cast\nquote compose [ if ] compose \\ VMACRO cast quote quote\n[ dup ] \\ ] quote compose [ = ] compose\n[ drop swap drop swap ] [ 1 ] quote compose [ swap metacrank swap crank ] compose \\ VMACRO cast quote compose\n[ dup dup dup ] \\ [ quote compose [ = swap ] compose \\ ( quote compose [ = or swap ] compose \\ \\ quote compose [ = or ] compose\n[ eval ] quote compose\n[ compose ] [ dup ] quote compose [ dip swap ] compose \\ VMACRO cast quote compose [ if ] compose \\ VMACRO cast\nquote compose [ if ] compose \\ VMACRO cast quote quote\ncompose compose [ if dup stack swap ] compose [ 0 ] quote compose [ crank ] compose\n[ 1 ] quote dup compose compose [ metacrank ] compose \\ VMACRO cast\ndef\n</pre>\n<p>\nOkay, so now we can use recursive quoting, just like in stem. But there are still a couple things missing that we probably\nwant: a good string quote implementation, and probably escape characters that work in the brackets. Also, since Cognition\nutilizes macros, we probably want a way to notate those as well, and we probably want a way to expand macros. We can do\nall of that! First, we will have to redefine <code>\\</code> once more:\n</p>\n<pre>\n\\ \\\n[ [ 1 ] metacrankbase [ 1 ] = ]\n[ halt [ 1 ] [ 1 ] metacrank quote compose [ dup ] dip swap ]\n\\ VMACRO cast quote quote compose\n[ halt [ 1 ] crank ] VMACRO cast quote quote compose\n[ if halt [ 1 ] [ 1 ] metacrank ] compose \\ VMACRO cast\ndef\n</pre>\n<p>\nThis piece of code defines the bracket but for macros (split just splits a list into two):\n</p>\n<pre>\n\\ (\n\\ [ unglue\n[ 11 ] split swap [ 10 ] split drop [ macro ] compose\n[ 18 ] split quote [ prepose ] compose dip\n[ 17 ] split eval eval\n[ 1 ] del [ \\ ) ] [ 1 ] put\nquote quote quote [ prepose ] compose dip\n[ 16 ] split eval eval\n[ 1 ] del [ \\ ) ] [ 1 ] put\nquote quote quote [ prepose ] compose dip\nprepose\ndef\n</pre>\n<p>\nWe want these macros to automatically expand because it's more efficient to bind already expanded macros to words,\nand they functionally evaluate identically (<code>isdef</code> just returns a boolean where true is a non-empty string, false\nis an empty string, if a word is defined):\n</p>\n<pre>\n\\ (\n( crankbase [ 1 ] metacrankbase dup [ 1 ] =\n [ ( dup \\ ) =\n ( drop swap drop swap [ 1 ] swap metacrank swap crank quote compose ( dup ) dip swap )\n ( dup dup dup \\ [ = swap \\ ( = or swap \\ \\ = or\n ( eval )\n ( dup isdef ( unglue ) [ ] if compose ( dup ) dip swap )\n if )\n if ) ]\n [ ( dup \\ ) =\n ( drop swap drop swap [ 1 ] swap metacrank swap crank )\n ( dup dup dup \\ [ = swap \\ ( = or swap \\ \\ = or\n ( eval )\n ( dup isdef ( unglue ) [ ] if compose ( dup ) dip swap )\n if )\n if ) ]\n if dup macro swap\n [ 0 ] crank [ 1 ] [ 1 ] metacrank ) def\n</pre>\n<p>\nand you can see that as we define more things, our language is beginning to look more or less like it has syntax!\nIn this <code>quote.cog</code> file which we have been looking at, there are more things, but the bulk of it is pretty much done.\nFrom here on, I will just explain the syntax programmed by quote.cog instead of showing the specific code.\n</p>\n<p>\nAs an example, here is <code>expand</code>:\n</p>\n<pre>\n# define basic expand (works on nonempty macros only)\n[ expand ]\n( macro swap\n ( [ 1 ] split\n ( isword ( dup isdef ( unglue ) ( ) if ) ( ) if compose ) dip\n size [ 0 ] &gt; ( ( ( dup ) dip swap ) dip swap eval ) ( ) if )\n dup ( swap ( swap ) dip ) dip eval drop swap drop ) def\n# complete expand (checks for definitions within child first without copying hashtables)\n[ expand ]\n( size [ 0 ] &gt; ( type [ VSTACK ] = ) ( return ) if ?\n ( macro swap\n macro\n ( ( ( size dup [ 0 ] &gt; ) dip swap ) dip swap\n ( ( ( 1 - dup ( vat ) dip swap ( del ) dip ) dip compose ) dip dup eval )\n ( drop swap drop )\n if ) dup eval\n ( ( [ 1 ] split\n ( isword\n ( compose cd dup isdef\n ( unglue pop )\n ( pop dup isdef ( unglue ) ( ) if )\n if ) ( ) if\n ( swap ) dip compose swap ) dip\n size [ 0 ] &gt; ) dip swap\n ( dup eval ) ( drop drop swap compose ) if ) dup eval )\n ( expand )\n if ) def\n</pre>\n<p>\nWhich recursively expands word definitions inside a quote or macro, using the word <code>unglue</code>. We've used the <code>expand</code>\nword in order to redefine itself in a more general case.\n</p>\n</div>\n</div>\n</div><div>\n<h2 id=\"org82fa718\"><span>7.</span> The Brainfuck Dialect</h2>\n<div>\n<p>\nAnd returning to whence we came, we define the <i>Brainfuck</i> dialect with our current advanced stem dialect:\n</p>\n<pre>\ncomment.cog load\nquote.cog load\n[ ] [ ] [ 0 ]\n[ &gt; ] [[ swap [[ compose ]] dip size [ 0 ] = [ [ 0 ] ] [[ [ 1 ] split swap ]] if ]] def\n[ &lt; ] [[ prepose [[ size dup [ 0 ] = [ ] [[ [ 1 ] - split ]] if ]] dip swap ]] def\n[ + ] [[ [ 1 ] + ]] def\n[ - ] [[ [ 1 ] - ]] def\n[ . ] [[ dup char print ]] def\n[ , ] [[ drop read byte ]] def\n[ pick ] ( ( ( dup ) dip swap ) dip swap ) def\n[ exec ] ( ( [ 1 ] * dup ) dip swap [ 0 ] = ( drop ) ( dup ( evalstr ) dip \\ exec ) if ) def\n\\ [ (\n ( dup [ \\ ] ] =\n ( drop swap - [ 1 ] * dup [ 0 ] =\n ( drop swap drop halt [ 1 ] crank exec )\n ( swap [ \\ ] ] concat pick )\n if )\n ( dup [ \\ [ ] =\n ( concat swap + swap pick )\n ( concat pick )\n if )\n if )\n dup [ 1 ] swap f swap halt [ 1 ] [ 1 ] metacrank\n) def\n&gt;&lt;+-,.[] dup ( i s itgl f d ) eval\n</pre>\n<p>\ntest with <code>../crank -s 2 bootstrap.cog helloworld.bf brainfuck.cog</code>. You may of course load your favorite brainfuck\nfile with this method. Note that brainfuck.cog isn't a brainfuck parser in the ordinary sense; it actually\n<i>defines brainfuck words</i> and <i>tokenizes</i> brainfuck, running it in the native cognition environment.\n</p>\n<p>\nIt's very profound, as well, how our current syntax allows us to define an <i>alternate</i> syntax with great ease. It might\nmake you wonder if it's possible to <i>specifically craft</i> a syntax whose job is to write other syntaxes. Another interesting\nobservation you might have is that Cognition defines syntax by defining a prefix character as a <i>word</i> that uses metacrank,\nrather than reading symbols and deciding what to do based on symbols. It's almost as if the syntax becomes <i>inherent</i> to the\nword that's being defined.\n</p>\n<p>\nThese two ideas synthesize to create something truly exciting, but that hasn't yet been implemented in the standard library\n(though we very much know that it is possible). Introducing: the <i>dialect dialect</i> of Cognition…\n</p>\n</div>\n<div>\n<h3 id=\"org76d486f\"><span>7.1.</span> The Dialect Dialect</h3>\n<div>\n<p>\nImagine a word <code>mkprefix</code>, that takes two input words (say for example <code>[</code> and <code>]</code>), and an operation, and\n<i>automatically defines</i> <code>[</code> to apply said operation until it hits a <code>]</code> character. This is possible because constructs\nlike <code>metacrank</code> and <code>def</code> are all just <i>regular words</i>, so it's possible to use <i>them</i> as words to metaprogram with.\nIn fact, <i>everything</i> is just a word (even <code>d</code>, <code>i</code>, and <code>s</code>), so you can imagine a hyperabstract dialect that includes\nwords like <code>mkprefix</code>, using syntax to automate the process of implementing more syntax. Such a construct I have not\nencountered in <i>any other programming language</i>. Yet, in your own <i>Cognition</i>, you can make nearly anything a reality.\n</p>\n<p>\nSuch creative things Matthew Hinton and I have discussed as possibilities regarding the standard library. Right now, the\nstandard library has metawords that generate abstract words automatically and call them. This is possible through string\nconcatenation and using <code>def</code> in the definition of another word also (this is also possible in my prior programming\nlanguage Stem). We have discussed the possibility of a word that searches for word-generators to abstract its current\nwordlist automatically, and we have talked about the possibility of directing this abstraction framework for the purpose\nof solving a problem. These are conceptually possible words to write within cognition, and this might give you an idea\nof how <i>powerful</i> this idea is.\n</p>\n</div>\n</div>\n</div><div>\n<h2 id=\"org0485dca\"><span>8.</span> Theoretical Musings</h2>\n<p>\nThere are a couple of things about Cognition that make it interesting beyond its quirks. For instance,\nstring processing in this language is equivalent to tokenizer postprocessing, which makes string operations inherently\nextremely powerful in this language. It also has potential applications in Symbolic AI and in syntax and grammar research,\nwhere prototypes of languages and metalanguages can be tested with ease. I'd imagine that anyone configuring a program\nthat reads a configuration file would really want their configuration language to be something like this, where they can\nhave full freedom over the syntax (and metasyntax) in which they program in (think about a Cognition based shell,\nor a Cognition based operating system!). Though, the point of working on this language was never its applications;\nits intrinsic beauty is its own philosophical statement.\n</p>\n</div><div>\n<h2 id=\"org6798ca1\"><span>9.</span> Conclusion</h2>\n<p>\nYou can imagine cognition can program basically any syntax you would want, and in this article, we demonstrate the power\nof the already existing code that makes cognition work. In short, the system allows for true <i>syntax as code</i>, as my\nfriend Andrei put it; one can <i>dynamically program</i> and even <i>automate</i> the production of syntax. In this article, we\ndidn't have the space to cover other important Cognition concepts like the <i>Metastack</i> and words like <code>cd</code>, but this\ncan be done in a part 2 of this blog post. For now, let's leave off here, and we can meet here once more for a <i>part two</i>.\n</p>\n</div>",
"author": "Preston Pan",
"favicon": "https://ret2pop.nullring.xyz/favicon-16x16.png",
"source": "ret2pop.nullring.xyz",
"published": "",
"ttr": 1250,
"type": ""
}