2018-08-27 Eggs updated to CHICKEN 5

I’ve recently ported most of my Scheme libraries to CHICKEN 5.

In particular, the following libraries are now available: begin-syntax, fuse, git, isaac, magic, module-declarations, pledge, r7rs, schematic, sysexits, type-extensions, and with-current-directory. I’ve also renamed my getopt library to optimism and released it to the egg servers.

I won’t be porting the following eggs, which are obsolete and unsupported: bitcoin, glfw, and yahoo-finance.

I’ll probably also port combinator-birds for fun, but it won’t be released to the egg servers.

2018-02-15 Improved Scheme language support in Vim

The next release of Vim, version 8.1, will include the runtime files from my vim-scheme plugin, and I’ll be maintaining Vim’s built-in Scheme support from now on.

Development will still happen in the linked repository, so you can continue to use the plugin if you want to stay as up-to-date as possible.

If you run into any problems editing Scheme with stock Vim, however, please don’t hesitate to contact me. Patches are also very welcome. In particular, I’d like to extend the keyword highlighting to support dialects other than CHICKEN (which is already pretty well-covered).

2017-01-03 Monitoring mail with kqueue

Like most people, my email setup is a Byzantine horror of shell scripts and Unix abuse. In an effort to further complicate things, I recently wrote a program that monitors a mail spool and runs a command whenever it receives new messages. Here it is in its entirety. It’s very simple and probably not of general use, but I figured I’d share it anyway since it turned out to be a nice little example of how to use kqueue(2).

I’m running it like so, where sortmail does some tagging and shuffles messages into the correct folders:

$ mm "$HOME/bin/sortmail"

The kqueue(2) API is pleasant enough, and strikes me as hitting just the right level of abstraction for its intended use.

2016-12-16 Self-compiling C files

A trick Scheme programmers use to make their scripts directly executable is to start them with a block comment and a line that immediately executes a Scheme interpreter:

#|
exec csi -s "$0" "$@"
|#

(print "Hello world!")

You can use a #define statement to achieve a similar result for C:

#define _ \
  cc "$0" && exec ./a.out "$@" || exit $?

int main(int argc, char **argv) {
  puts("Hello world!");
  return 0;
}

2016-07-09 Creating empty files with Ansible

Creating a file when it doesn’t already exist is surprisingly hard to do with Ansible.

The file module has a state=file mode, but it doesn’t create files. Obviously.

It also has a state=touch mode, but that will unconditionally modify the file’s timestamp and register a “change” event, even when the file already exists.

I’ve settled on the following:

- file: path=foo state=touch
  register: touch
  changed_when: touch.diff.before.state == "absent"

This only works because variables are (apparently) defined before a decision is made about whether something has changed, but that’s good enough for me. There are, of course, other ways one might accomplish this, but I was after a single task and wanted to avoid the command and shell modules, if possible.

I don’t know why the file module’s state=file option doesn’t create files. Ansible’s docs suggest that one “see the copy or template module if you want that behavior”, but introducing a template just to create an empty file seems pretty naff to me.

2016-06-17 Automated OpenBSD Server Deployment

I use Vultr for server hosting, and while they have support for automatically installing Linux and Windows there’s currently no way to deploy OpenBSD from a release ISO without logging in to a VNC console and running the installer manually.

That said, it’s pretty easy to build a custom OpenBSD disk image that will self-install when booted. IANAOBSDD, but the project’s build scripts are very readable and easy to modify.

The steps involved are:

Fetch the OpenBSD source code.
Put an auto_install.conf file in the disk’s root. This will trigger automatic installation when the disk is booted.
Optionally put a disklabel.conf file in the disk’s root. This will control how the disk is partitioned.
Optionally put an etc.rc.firsttime script in the disk’s root. This will be appended to /etc/rc.firsttime in the installed system and run on first boot.
Add these files to the install manifest for your architecture.
Build a new CD image.

The files I used for steps 1-3 look roughly like this:

$ cat /usr/src/distrib/miniroot/auto_install.conf
System hostname = wunderbox
Password for root account = turtle
Do you expect to run the X Window System = no
Setup a user = evhan
Password for user = turtle
Use (W)hole disk = W
URL to autopartitioning template = file:/disklabel.conf
Location of sets = http
Set name(s) = -game*
Set name(s) = -x*
What timezone are you in = UTC

$ cat /usr/src/distrib/miniroot/disklabel.conf
/             512M-1G    8%
swap          1G
/tmp          512M-1G    8%
/var          4G-6G      30%
/usr          2G-6G      20%
/usr/local    2G-10G     20%
/home         1G-*       10%

$ cat /usr/src/distrib/miniroot/etc.rc.firsttime
echo 'ssh-rsa ...' >> /home/evhan/.ssh/authorized_keys

The firsttime script can do whatever you like, adjust according to taste. Also note that the disklabel template can be any URL supported by ftp(1), not just a local file.

Add these files to the build manifest for your architecture. Vultr uses amd64 boxes, so I modified distrib/amd64/common/list. I also added a single line to distrib/miniroot/install.sh so that my firsttime script runs when the new system is started. The final diff should look something like this:

$ cd /usr/src
$ cvs -q diff
? distrib/miniroot/auto_install.conf
? distrib/miniroot/disklabel.conf
? distrib/miniroot/etc.rc.firsttime
Index: distrib/amd64/common/list
===================================================================
RCS file: /cvs/src/distrib/amd64/common/list,v
retrieving revision 1.39
diff -u -p -r1.39 list
--- distrib/amd64/common/list   13 Apr 2015 21:27:07 -0000      1.39
+++ distrib/amd64/common/list   16 Jun 2016 23:05:03 -0000
@@ -84,3 +84,8 @@ SCRIPT        ${CURDIR}/../../miniroot/upgrade.
 SCRIPT ${CURDIR}/../../miniroot/install.sh     install
 SCRIPT ${CURDIR}/../../miniroot/install.sub    install.sub
 SPECIAL        chmod 755 install upgrade
+
+# autoinstall configuration
+COPY   ${CURDIR}/../../miniroot/auto_install.conf auto_install.conf
+COPY   ${CURDIR}/../../miniroot/disklabel.conf    disklabel.conf
+COPY   ${CURDIR}/../../miniroot/etc.rc.firsttime  etc.rc.firsttime
Index: distrib/miniroot/install.sh
===================================================================
RCS file: /cvs/src/distrib/miniroot/Attic/install.sh,v
retrieving revision 1.275
diff -u -p -r1.275 install.sh
--- distrib/miniroot/install.sh 11 Feb 2016 14:24:28 -0000      1.275
+++ distrib/miniroot/install.sh 16 Jun 2016 23:05:03 -0000
@@ -314,3 +314,6 @@ pwd_mkdb -p -d /mnt/etc /etc/master.pass
 
 # Perform final steps common to both an install and an upgrade.
 finish_up
+
+# Add custom firsttime commands.
+[ -f /etc.rc.firsttime ] && cat < /etc.rc.firsttime >> /mnt/etc/rc.firsttime

Build the CD image:

$ make -C /usr/src/distrib/special
$ make -C /usr/src/distrib/amd64
$ file /usr/obj/distrib/amd64/cdfs/cd59.iso
/usr/obj/distrib/amd64/cdfs/cd59.iso: ISO 9660 CD-ROM filesystem data 'OpenBSD/amd64   5.9 boot-only C' (bootable)

Use cd59.iso (or whatever version you’ve just built) as a custom image when creating a new server and it will auto-install when booted.

One-time ISO Boot

This approach still requires that you detach the install media manually once the installation finishes. If you don’t, the server will continue to boot from cd59.iso even after the OS is present, leading to a never-ending install/reboot cycle.

As a workaround you can trigger the installation from an iPXE script, which will run just once. A Syslinux MEMDISK can be used to chain-load the ISO, for example:

#!ipxe
initrd http://example.com/cd59.iso
chain http://example.com/memdisk iso raw

Properly securing this approach is left as an exercise for the reader.

References:

2016-05-30 Command line keyword arguments

Lately I’ve been using the following trick as a quick and dirty way to handle command line arguments:

(define (command-line-data)
  (map (lambda (s) (call-with-input-string s read))
       (command-line-arguments)))

(define (main #!key (foo 1) (bar 'abc) (baz '()))
  (printf "~s ~s ~s~n" foo bar baz))

(apply main (command-line-data))

The command-line-data procedure just converts a program’s arguments to a list of Scheme data, which isn’t too interesting by itself. The real trick is to write your main procedure so that it accepts whatever keyword arguments you want your program to accept, then invoke your program as though you were calling it from Scheme code:

$ csc -R ports program.scm
$ ./program foo: -1 bar: xyz baz: '(1 2 3)'
-1 xyz (1 2 3)

I still haven’t decided whether this is terribly clever or just plain terrible.

2016-05-24 A one-word bug in the POSIX specification of fflush

We recently encountered a fun issue in CHICKEN relating to file handles, fork(), and exit() that turned out to be caused by a very literal interpretation of the POSIX specification of fflush().

The main description of the function is divided into three paragraphs:

The first describes the flushing behaviour of output streams.
The second specifies that fflush(NULL) will flush all streams “for which the behavior is defined above” (emphasis mine).
The third, a POSIX.1-2008 extension, describes the flushing behaviour of input streams.

As a result of that single word “above”, a strictly-conformant libc will implement fflush(NULL) so that only output streams are flushed, despite the fact that the behaviour of fflush() on input streams is perfectly well-defined in POSIX-1.2008. Indeed, that’s exactly what musl does, which is why the bug was first encountered with that C library.

It’s a shame, since omitting that one word would have resulted in a much more coherent specification.

Credit goes to Kooda for figuring all of this out, and to TheLemonMan for the initial bug report.

2016-05-28 Update

It looks like a correction for this problem has already been proposed and accepted by the POSIX working group. The fix, which will presumably be included in the next version of the spec, simply swaps the second and third paragraphs of the function’s description.

2016-05-10 Packaging self-contained CHICKEN programs

This is a short walk through the process of packaging a dynamically-linked CHICKEN 4.x application for binary deployment.

Note that this does not apply to CHICKEN 5.x! Newer versions of CHICKEN support static compilation, which is the preferred option.

CHICKEN 4.x on the other hand doesn’t prescribe a process for this, and there are lots of ways to achieve similar results, but this is one approach that has worked well for me in the past.

But First: The Easy Option

Before I describe how to manually bundle an application, I should mention the automatic method that’s almost guaranteed to work for normal programs:

$ csc -deploy <program>.scm
$ chicken-install -init <program>
$ chicken-install -deploy -prefix <program> [<egg> ...]

That’s it. These three commands:

Compile a program and place it, along with the CHICKEN runtime, into a directory named after the input file.
Copy all of CHICKEN’s standard libraries into the directory.
Install eggs, including their dependencies, into the directory.

The reason I’m going to explain an alternative, albeit more complicated, approach to packaging is that this simple method can potentially result in very large application bundles. This is because it bundles every file from the installed eggs that your program could possibly use at runtime, regardless of whether they’re truly needed. This can be wasteful of disk space, but if package size isn’t an issue I suggest you take the easy route.

Bundling the Runtime

If you’re still reading, you’ve picked the hard option. Good choice. As an example program we’ll be packaging a CHIP-8 emulator I wrote in the spirit, though not the deed, of itch.io’s Spring 2016 Lisp Game Jam. To start with, clone the project and fetch its dependencies:

$ git clone -b chicken-4.x https://git.foldling.org/chick-8.git
$ cd chick-8
$ chicken-install -s -x begin-syntax ncurses r7rs

This will download and install the two eggs this program uses into CHICKEN’s extension repository. You can verify that they’ve been installed with chicken-status, and you can see what files they’ve included with chicken-status -f begin-syntax ncurses r7rs.

The commands to actually build the project look like this:

$ csc -X chick-8-syntax -R ncurses chick-8-curses.scm -c
$ csc -X chick-8-syntax -R r7rs -X r7rs -uses chick-8-curses chick-8.scm chick-8-curses.o

Don’t worry too much about the specifics here, just know that these two commands compile the program, and the second is the one that actually produces an executable. If you run the resulting file, it should print a usage message:

$ ./chick-8
usage: chick-8 [options ...] <file>
[... more output ...]

We now have a binary that works fine locally, but requires that the host system have CHICKEN installed. If you were to copy it onto another machine with the same architecture but without a CHICKEN runtime, it would fail with an error like the following:

$ ./chick-8
./chick-8: error while loading shared libraries: libchicken.so.8: cannot open shared object file: No such file or directory

Luckily, the “CHICKEN runtime” is just a single shared library and, as with the easy option, we can use the compiler’s “-deploy” flag to automatically bundle it with our program in a directory:

$ rm chick-8
$ csc -deploy -X chick-8-syntax -R r7rs -X r7rs -uses chick-8-curses chick-8.scm chick-8-curses.o
$ ls chick-8
chick-8
libchicken.so.8
$ ldd chick-8/chick-8
    linux-vdso.so.1 (0x00007ffc783e4000)
    libchicken.so.8 => /home/evhan/chick-8/chick-8/libchicken.so.8 (0x00007f6b8f402000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6b8f0de000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6b8eed9000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6b8eb35000)
    /lib64/ld-linux-x86-64.so.2 (0x000055f8feeca000)

There’s lots more information about this feature in the CHICKEN manual.

Bundling Eggs

At this point if you run the resulting program you’ll hit an error:

$ chick-8/chick-8

Error: (require) cannot load extension: ncurses

We’ve bundled the CHICKEN runtime, but we’re still missing the program’s egg dependencies. Luckily, eggs are mostly just shared libraries that can be copied into our application directory just like “libchicken.so”. The tricky part is usually figuring out exactly what files to copy, but the CHICKEN compiler has a (slightly obscure) option that can help with that.

Returning to our csc(1) invocations, if we add “-debug M” to both, we’ll see a list of “static” and “dynamic” library requirements for each file. Don’t worry about the static requirements – those are handled at compile time – but do take note of the dynamic ones:

$ csc -X chick-8-syntax -R ncurses chick-8-curses.scm -c -debug M
; requirements:
((static chicken-syntax eval library srfi-4 srfi-18)
 (dynamic ncurses))
$ csc -X chick-8-syntax -R r7rs -X r7rs -uses chick-8-curses chick-8.scm chick-8-curses.o -deploy -debug M
; requirements:
((static posix chick-8-curses library eval chicken-syntax ports srfi-18)
 (dynamic numbers scheme.base r7rs))

From this we can tell that we need to copy at least¹ the “ncurses”, “numbers”, “scheme.base”, and “r7rs” libraries from the extension repository into our program directory. We could do this manually for each object:

$ cp "$(chicken-install -repository)/r7rs.so" chick-8
[... and so on ...]

But it’s easier to just use chicken-status(1) to list the files in the repository and filter for the ones we need:

$ chicken-status -f | \
   grep -E '(ncurses|numbers|r7rs|scheme.base).so' | \
   xargs -I % cp % chick-8
$ ls chick-8
chick-8
libchicken.so.8
ncurses.so
numbers.so
r7rs.so
scheme.base.so
$ chick-8/chick-8
usage: chick-8 [options ...] <file>
[... more output ...]

With that, we have a freely-relocatable application bundle that that requires nothing of its host but a working C library².

Packaging the Result

With this self-contained version of our program, we’re not too far from a package in a format such as deb(5) or rpm(8).

Most OS-specific package formats boil down to a handful of archive files, usually compressed, containing an application’s files and a bunch of metadata. Learning the vagaries of any one of these formats is quite a task, let alone trying to support many of them at once. Thankfully, there are tools that can help.

One such tool that I like is fpm. That link has the details, but for our purposes it’s enough to know that we can hand fpm a directory and it will give us a package in return.

Let’s build that directory. We’ll call it “package”, copy our bundled files inside, and link to the binary itself from /usr/bin:

$ mkdir -p package/usr/bin package/usr/libexec
$ cp -fR chick-8 package/usr/libexec
$ ln -fs /usr/libexec/chick-8/chick-8 package/usr/bin/chick-8

Now we just pass our “package” directory off to fpm. In this example I’m building a Debian package, but you can control the output format with the -t argument:

$ gem install fpm # if necessary
$ fpm -s dir -t deb -a native -C package \
      --name chick-8 \
      --version 0.0.1 \
      --license BSD \
      --description 'CHIP-8 emulator' \
      --depends "libc6 (>= 2.3)" \
      --depends "libncurses5 (>= 5.0)" .
Created package {:path=>"chick-8_0.0.1_amd64.deb"}
$ dpkg --info chick-8_0.0.1_amd64.deb
 new debian package, version 2.0.
 size 1760812 bytes: control archive=619 bytes.
     291 bytes,    12 lines      control
     498 bytes,     7 lines      md5sums
 Package: chick-8
 Version: 0.0.1
 License: BSD
 Vendor: evhan@capsaicin
 Architecture: amd64
 Maintainer: <evhan@capsaicin>
 Installed-Size: 5450
 Depends: libc6 (>= 2.3), libncurses5 (>= 5.0)
 Section: default
 Priority: extra
 Homepage: http://example.com/no-uri-given
 Description: CHIP-8 interpreter
$ dpkg --contents chick-8_0.0.1_amd64.deb
drwxr-xr-x 0/0               0 2016-05-14 14:32 ./
drwxr-xr-x 0/0               0 2016-05-14 14:32 ./usr/
drwxr-xr-x 0/0               0 2016-05-14 14:32 ./usr/share/
drwxr-xr-x 0/0               0 2016-05-14 14:32 ./usr/share/doc/
drwxr-xr-x 0/0               0 2016-05-14 14:32 ./usr/share/doc/chick-8/
-rw-r--r-- 0/0             138 2016-05-14 14:32 ./usr/share/doc/chick-8/changelog.gz
drwxr-xr-x 0/0               0 2016-05-14 14:32 ./usr/
drwxr-xr-x 0/0               0 2016-05-14 14:32 ./usr/bin/
lrwxrwxrwx 0/0               0 2016-05-14 14:32 ./usr/bin/chick-8 -> /usr/libexec/chick-8/chick-8
drwxr-xr-x 0/0               0 2016-05-14 14:32 ./usr/libexec/
drwxr-xr-x 0/0               0 2016-05-14 14:32 ./usr/libexec/chick-8/
-rwxr-xr-x 0/0         4583656 2016-05-14 14:32 ./usr/libexec/chick-8/libchicken.so.8
-rwxr-xr-x 0/0          173936 2016-05-14 14:32 ./usr/libexec/chick-8/ncurses.so
-rwxr-xr-x 0/0          150904 2016-05-14 14:32 ./usr/libexec/chick-8/scheme.base.so
-rwxr-xr-x 0/0           21640 2016-05-14 14:32 ./usr/libexec/chick-8/r7rs.so
-rwxr-xr-x 0/0          188096 2016-05-14 14:32 ./usr/libexec/chick-8/chick-8
-rwxr-xr-x 0/0          463536 2016-05-14 14:32 ./usr/libexec/chick-8/numbers.so

As you can see, fpm has added a changelog and filled in some default metadata for us. You’ll typically want to tweak this metadata to reflect reality, which can be done with the program’s many command-line flags (see fpm --help).

While a package built this way is totally unsuitable for inclusion in a distro’s official repositories – I’ve no idea how many RPM packaging guidelines something like this breaks, for example, but I’m pretty sure it’s not none – it’s a fine way to distribute packages within your own infrastructure, or to end users who just want an easy way to install your software.

One major problem that this article totally glosses over is that of transitive dependencies: if an egg your program uses depends on another library at runtime, you’ll need to copy that one too.↩︎
That’s a lie, this example also requires a working ncurses library.↩︎

2015-10-26 Using preprocessor macros in GDB

Compiling with gcc -g3 preserves preprocessor macros so they can be used from within GDB. Very handy.

2015-09-22 Vim as Sed

I keep the following in ~/bin/vimdo:

vim -es /dev/stdin +"$*" +'q!'

It runs a single Vim command on standard input, then quits.

An example use case from the last few days: selecting the range of lines between “foo” and “bar”, not including either. I know I can use address offsets in Vim for this, so I do: vimdo '/foo/+1, /bar/-1 p' < file. There may be a similar feature in sed(1) and there’s almost certainly a way to do it in awk(1), but reusing the command language with which one spends hours a day makes sense to me.

2015-04-07 Finding a pathname of the current executable

This is a surprisingly platform-specific task.

Linux - readlink("/proc/self/exe", ...)
NetBSD - readlink("/proc/curproc/file", ...)¹
DragonFly BSD - readlink("/proc/curproc/file", ...)
FreeBSD - sysctl({CTL_KERN, KERN_PROC, KERN_PROC_PATHNAME, getpid()}, ...)²
Solaris - readlink("/proc/self/path/a.out", ...)
Mac OS X - _NSGetExecutablePath or proc_pidpath(getpid(), ...)³
Haiku - get_next_image_info(0, ...) until .type == B_APP_IMAGE, then .name
Other Unix-alikes - You’re on your own: search the CWD and $PATH⁴
Windows - GetModuleFileName(NULL, ...)

This is only possible when procfs is enabled.↩︎
procfs is also a possibility if it’s enabled, though this feature is deprecated on FreeBSD.↩︎
Note, however, that “libproc.h” doesn’t appear to be part of any public API. Buyer beware.↩︎
And don’t forget to hope and pray.↩︎

2015-01-27 snow2 repository

I’ve created a snow2 repository containing some of my pure-R7RS Scheme libraries. I hope to include the rest shortly.

It’s available here, and ought to be compatible with sethalves’ snow2-client and ashinn’s snow-chibi programs.

2014-01-20 Joe Armstrong's Universal Server in Various Languages

Joe Armstrong’s favorite Erlang program is a pretty fun one to transcribe.

The original looks like this:

universal_server() ->
   receive
      {become, F} ->
          F()
   end.

In Scheme, using Termite:

(define (universal-server)
  (spawn
   (lambda ()
     (recv (('become f)
            (f))))))

In Go, using channels¹:

type Message interface {}

type Become struct {
  f func()
}

func universalServer() chan Message {
  c := make(chan Message)
  go func() {
    switch v := <- c; v.(type) {
      case Become:
        close(c)
        v.(Become).f()
    }
  }()
  return c
}

In Lua, using ConcurrentLua ²:

concurrent = require 'concurrent'

-- Expects messages of the form { t = "become", f = function () ... end }
universal_server = function ()
  return concurrent.spawn(function ()
    local msg = concurrent.receive()
    if msg.t == "become" then
      msg.f()
    else
      -- drop
    end
  end)
end

In Scala, using Akka:

case class Become(f : () => _)

class UniversalServer extends Actor {
  def receive = {
    case Become(f) => f()
  }
}

This is slightly off, since it isn’t networked (but you can just pretend c is a net.Read sink), and universalServer receives from c unconditionally. The channel should really be buffered, with universalServer ignoring non-Become Messages.↩︎
Again, the server should ignore messages where msg.t != "become".↩︎

2014-01-08 Assorted mail scripts

Use Vim as a colorizing pager:

#!/bin/sh

fold -s \
  | vim -R - \
    -c "sil run macros/less.vim" \
    -c "sil set ft=mail" \
    -c "sil set shm+=T"

Safely bogofilter an online mail spool:

#!/bin/ksh

set -o monitor

if temp=$(mktemp -q)
then
  /usr/libexec/lockspool |&
  if read -p ok && [ "$ok" = 1 ]
  then
    bogofilter -pM < "$MAIL" > "$temp"
    if [ $? -lt 3 ]
    then cat < "$temp" > "$MAIL"
    else error=1
    fi
  fi
  rm -f "$temp"
  exec 3>&p 3>&-
  exit $error
fi

Other useful programs:

2013-10-11 Asymmetric encryption for s3cmd

s3cmd(1) supports automatic encryption & decryption of data, but it defaults to using a symmetric cypher and passphrase. Happily, it’s straightforward to configure it to use a public key instead, for those who are so inclined.

The important directives in $HOME/.s3cfg are:

gpg_decrypt = %(gpg_command)s -d --verbose --use-agent --batch --yes -o %(output_file)s %(input_file)s
gpg_encrypt = %(gpg_command)s -e --verbose --use-agent --batch --yes -R <key-id> -o %(output_file)s %(input_file)s
gpg_passphrase = Ignored

Replace <key-id> with yours.

Note that gpg_passphrase is required but unused (s3cmd will whine and quit if it’s unset).

(If you don’t have a $HOME/.s3cfg, create it with s3cmd --configure.)

2013-09-10 The limits of size_t and ptrdiff_t

Given a block of memory, you can probably express its size as a value of type size_t. This isn’t actually guaranteed – size_t is only required to hold values 65535 or greater, given by SIZE_MAX – but it’s pretty safe to assume since AFAIK most everybody defines size_t according to the machine’s word length.

However, given two pointers, it’s not nearly as certain that you can express their difference with a ptrdiff_t. Because ptrdiff_t is signed (in order to indicate the direction of the difference), its maximum value is SIZE_MAX / 2 (actually PTRDIFF_MAX, also from stdint.h, although the two values should be equal), with undefined behavior if the result of the subtraction doesn’t fit in its range. This strikes me as a far more likely limitation to run up against.

2013-07-16 Compiling statically-linked CHICKEN Scheme programs with extensions

While static compilation with extensions isn’t officially supported by the CHICKEN 4.x toolchain (chicken-install(1) et al.), it’s usually possible. However, doing so means compiling the extension and all of its dependencies yourself.

A simple example of this follows. The general strategy is to:

Fetch each extension’s source.
Compile each extension into units and import files.
Compile your program in the presence of the import files.
Link your program with the units.

All of the information here is available in more detail in the CHICKEN manual – I’ve just extracted the relevant parts and added a narrative.

Also, note that this post only applies to CHICKEN 4.x. In CHICKEN 5.0 static linking is handled automatically when you pass the “-static” flag to csc(1) and the C compiler, so none of the following contortions are required.

Our goal is to produce a statically-linked program that uses the uri-generic extension, which itself uses the matchable and defstruct extensions.

We’ll start with matchable. Fetch its source with chicken-install -retrieve.

$ chicken-install -retrieve matchable > /dev/null
$ ls matchable
matchable.meta
matchable.scm
matchable.setup
matchable-test.scm
match-simple.scm

Each extension includes several files we’re not interested in¹, and then one – if we’re lucky – primary source file. For this extension, that’s matchable.scm.

To use this file statically, we compile it to an ELF object as a “unit”, then link that object into our final program.

$ cat program.scm; echo
(use matchable)
(match '(1 . 2)
  ((x . y)
   (display "Hooray!")
   (newline)))

$ csc -unit matchable -emit-import-library matchable -c matchable/matchable.scm -o matchable.o
$ ls matchable.*
matchable.import.scm
matchable.o
$ csc -uses matchable -c program.scm -o program.o
$ csc -static program.o matchable.o -o program
$ file program
program: ELF 64-bit LSB  executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=7678fccb131e4f92a0ef199f985b110079df7dbf, not stripped
$ ./program
Hooray!

So far, little has changed from the manual’s example of compiling multiple files. The important differences are:

The addition of the -unit and -uses flags.
The addition of the -emit-import-library flag.

The -unit and -uses flags enforce the compilation model described in the manual as CHICKEN’s basic mode of operation, despite the fact that neither matchable.scm nor program.scm declare any units as described².

The -emit-import-library flag causes one additional file, matchable.import.scm, to be produced when compiling matchable.scm. This file contains compile-time information about the matchable module that’s used by csc when it’s imported into another program³.

Now, do the same for defstruct.

$ cat program.scm; echo
(use matchable defstruct)
(defstruct point x y)
(match (make-point x: 1 y: 2)
  (($ point 1 2)
   (display "Yippee!")
   (newline)))

$ chicken-install -r defstruct > /dev/null
$ csc -unit defstruct -cJ defstruct/defstruct.scm -o defstruct.o
$ csc -uses matchable,defstruct -c program.scm -o program.o
$ csc -static program.o defstruct.o matchable.o -o program
$ file program
program: ELF 64-bit LSB  executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=4cd22ac0bd74ba3721fbfb6c4dc8dc2eb5e4a170, not stripped
$ ./program
Yippee!

The binary now includes both matchable and defstruct⁴.

You can see where this is going. For each extension we use, and for each extension any of those extensions use (and so on, recursively – this is Scheme, after all), build an object with -unit and declare it as a dependency of every file that uses it with -uses.

One last time, with uri-generic (remember, that’s why we started down this rabbit hole in the first place). Here, uri-generic uses matchable and defstruct, while our program uses only uri-generic.

$ cat program.scm; echo
(use uri-generic)
(write (uri-path (uri-reference "http://example.com/super/happy/fun/time")))
(newline)

$ chicken-install -r uri-generic > /dev/null
$ csc -unit uri-generic -uses matchable,defstruct -cJ uri-generic/uri-generic.scm -o uri-generic.o
$ csc -uses uri-generic -c program.scm -o program.o
$ csc -static program.o uri-generic.o matchable.o defstruct.o -o program
$ ./program
(/ "super" "happy" "fun" "time")

We now have a static binary containing all of the required extensions.

Note that, when it comes time to link our program, an object for every extension that’s used must be included in the resulting binary. This is tedious, but because they’re standard ELF objects, they can be bundled into archives, making them easier to manage in bulk.

$ ar rc uri-generic.a uri-generic.o matchable.o defstruct.o
$ rm *.o
$ csc -uses uri-generic -static program.scm uri-generic.a
$ ./program
(/ "super" "happy" "fun" "time")

There you have it. It’s a bit of work, but the same can be done for most extensions, and the ability to produce a self-contained program that makes use of the many eggs in CHICKEN’s coop is nice when needed.

The setup and meta files in particular are used by CHICKEN’s package manager.↩︎

In fact, the -unit and -uses flags are equivalent to inserting those declarations manually. Again from the manual:

-unit NAME
    Compile this file as a library unit.
    Equivalent to -prelude "(declare (unit NAME))"
-uses NAME
    Use definitions from the library unit NAME.
    This is equivalent to -prelude "(declare (uses NAME))".

↩︎

Indeed, because CHICKEN’s module system is entirely syntactic, this file can be discarded once our program has been compiled. Import libraries are described in more detail here.↩︎
A closer look at matchable and defstruct will show that they’re both syntax-only extensions, so it’s technically unnecessary to compile them into objects at all. Because macros are included in a module’s import library, it’s enough to produce matchable.import.scm and defstruct.import.scm and have those files available (i.e. present in the current directory) when compiling program.scm. This makes them bad examples in particular, but this article gives a general approach that will work with other extensions, too.↩︎

2013-02-17 Falsity in JavaScript

I never remember these. Perhaps writing them down will help.

> !! 0
false
> !! ""
false
> !! {}
true
> !! []
true
> !! NaN
false
> !! null
false
> !! undefined
false

If you really want to feel rage, try comparing some of these with ==, which is notoriously useless dangerous. For example:

> '' == 0
true
> 0 == '0'
true
> '' == '0'
false

(What, you thought equality was transitive? Don’t be silly.)

Because of things like this, I think there’d be value in a “CoffeeScript lite” that rewrites the absurd parts of JavaScript (such as replacing all instances of == with === as CoffeeScript does, or fixing instanceof so that "abc" instanceof String is true) without implementing an entirely new syntax.

2013-01-20 Simple dtach session manager

One of the most useful scripts in my bin is dt. It manages named dtach(1) fifos under $HOME/.dtach.

#!/bin/sh

name=$1
dir=$HOME/.dtach

usage() {
  echo "usage: $(basename "$0") [-n name] cmd"
}

if [ $# = 0 ]
then
  ls -1 "$dir"
else
  mkdir -p "$dir"
  while getopts hn: arg; do
    case $arg in
      ?) usage>&2; exit 1;;
      n) name=$OPTARG; shift 2;;
      h) usage; exit;;
    esac
  done
  DTACH="$(date +%s)-$name" \
    dtach -A "$dir/$name" -r winch "$@"
fi

Called without arguments, it lists running sessions. Called with a command line, it either resumes the session with the name of the program to be run, or starts a new one if none exists. An alternate session name can be given with -n.

I use this command a lot:

$ fc -ln 0 | wc -l
9367
$ fc -ln 0 | grep -c '^ dt[ $]'
1264

2014-06-04

abduco(1) by Marc André Tanner of brain-dump.org (author of dvtm(1)) provides exactly this feature out of the box.

2013-01-18 Editing Scheme with Vim

For other kooks like me who refuse to use Emacs.

2018-02-15 Update

Much of the following information is out of date as of Vim 8.1, which incorporates my vim-scheme plugin and thus includes many of the changes described here without any extra configuration.

Indentation

Vim’s built-in Lisp indentation (“set lisp”) is hardcoded to shift inner forms by two spaces. I much prefer the indentation style used by the various emacsen, where identifiers not in lispwords are vertically aligned; this patch against Vim 7.3 emulates this style.

Alternatively, you can use my schematic-format utility as a source formatter. To configure it as Vim’s = operator for Scheme files, use the following command:

au FileType scheme setl equalprg=schematic-format

Lispwords

If you choose not to use an 'equalprg', getting lispwords right can ease a lot of manual-indentation pain. The most important change to make is…

au FileType scheme setl lispwords-=if

… Which vertically aligns the arms of if expressions.

Syntax

Vim colors parentheses as Delimiters by default, which is too bright for my taste in most colorschemes. Additionally, I like to have a visual indication of which parentheses are parts of quoted forms, as opposed to normal code. This patch colors “normal” parentheses as Comments (which are light grey in my colorscheme), while parentheses in quote and quasiquote forms remain colored as Delimiters (red, with inner unquotes returning parentheses to the lighter Comment color).

I also maintain reasonably complete syntax files for R7RS Scheme and CHICKEN’s extensions to the language. To use them, either install my vim-scheme plugin with your plugin manager of choice or place the following files in the corresponding directories under ~/.vim/:

REPL

I’ve written about simple REPL integration here. I still use a slight variation on that approach.

Plugins

Surround

If you use vim-surround, the following line adds a binding to the s key that surrounds forms in a function application, so that, for example, yssscons wraps the current line in (cons ...).

let g:surround_115 = "(\1function: \1 \r)"

Smart Input

lexima is the best autoclose plugin I’ve found so far. It’s based on the solid vim-smartinput, but provides the added benefit of dot-repeatable insertions.

It also works well for Scheme, with minor customization. The following lines in ~/.vimrc prevent it from autoclosing quote and quasiquote characters:

call lexima#add_rule({ 'char': "'",  'input': "'", 'filetype': ['lisp', 'scheme'] })
call lexima#add_rule({ 'char': "`",  'input': "`", 'filetype': ['lisp', 'scheme'] })

CHICKEN

Documentation

Setting keywordprg gives you access to chicken-doc’s documentation for a given identifier when K is pressed over it.

setl keywordprg=chicken-doc

Completion

Vim

The following script dumps function names from chicken-doc into files named by node under ~/.vim/completion/scheme.

#!/bin/sh -e

dir="$HOME/.vim/completion/scheme"

first() { cut -d' ' -f1; }

mkdir -p "$dir"

for n in $(chicken-doc -c chicken | first)
do
  chicken-doc -c chicken "$n" | first | tee "$dir/chicken-$n"
done

for n in $(chicken-doc -c | first)
do
  [ "$n" = "chicken" ] || chicken-doc -c "$n" | first | tee "$dir/$n"
done

These files can be sourced to add completion for CHICKEN identifiers by adding the following to ~/.vim/ftplugin/scheme.vim:

setl complete+=d,k~/.vim/completion/scheme/**

Obviously, the same could be done for any other language, given a way to generate its wordfile(s).

csi

The same wordfiles can be used for tab completion in csi with rlwrap.

cat ~/.vim/completion/scheme/* > ~/.csiwords

rlwrap -f "$HOME/.csiwords" csi

2013-01-06 An inotify gotcha

To recursively watch a directory, one typically monitors IN_CREATE events and, if the subject is a directory, inotify_add_watches it as well.

However, this introduces the potential for what are essentially TOCTTOU bugs: by the time you’ve inotify_add_watched a newly-created directory, it may have already been the subject of events about which you’re interested. To correct for this, you have to add the watch and then immediately walk the directory to act on any files that were created before the watch was registered before returning to handle new events resulting from the watch.

2012-12-01 Python scoping rules

(No, it doesn’t.)

Python’s an OK language, but its scoping rules are pretty broken.

>>> foo = 1
>>> def incfoo():
...     foo += 1
...     return foo
...
>>> incfoo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in incfoo
UnboundLocalError: local variable 'foo' referenced before assignment

I’ve been told the “correct” way to do this in Python is using containers…

>>> foo = [1]
>>> def incfoo():
...     foo[0] += 1
...     return foo[0]
...
>>> incfoo()
2

… Which is insane.

Even more sinister is the way list comprehensions brutally pollute their surrounding scope:

>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> for x in [1, 2, 3]:
...     x + 1
...
2
3
4
>>> x
3
>>> [x + 1 for x in [4, 5, 6]]
[5, 6, 7]
>>> x
6

This has been fixed in Python 3, thankfully. For 2.x releases, you can work around it with map (for the latter case, anyway).

2012-08-05 Poor man's SLIME

I’m not an Emacs user and I don’t care for slimv, but the ability to send text from $EDITOR to a running REPL a la SLIME is invaluable. For a while I used a modified version of tslime.vim, but I’ve since ditched it for something much simpler.

In sh(1):

mkfifo repl
cat > repl & $command < repl

In vim(1):

nmap <C-c><C-c> vap:w >> repl<CR>

That’s it. ^C^C sends the toplevel expression under the cursor to the $command reading from repl, no plugins or external tools are needed, and concerns are separated correctly: the editor and repl are isolated by pty and the view is managed by the window manager, as it should be.

2012-04-08 Tricksy byte to ASCII hex value conversions

short n = 10;
n["0123456789ABCDEF"]; /* => 'A' */

Too cute by half.

2012-03-18 Three ways to hook a library function in C

Say one wanted to modify malloc such that it punted on allocations larger than 1024 bytes.

At runtime, by library interposition using LD_PRELOAD. This is the most common method. With the following as hook.c:

/*
 * gcc -shared -fPIC -D_GNU_SOURCE -ldl -o hook.so hook.c
 * LD_PRELOAD=./hook.so ./program
 */

#include <dlfcn.h>
#include <stdio.h>

void *malloc(ssize_t size) {
  typeof(malloc) *orig_malloc;

  if (size > 1024) {
    fprintf(stderr, "Whoa, %ld is a lot of bytes... I don't think I'm up for that.\n", size);
    return NULL;
  }

  orig_malloc = dlsym(RTLD_NEXT, "malloc");
  return orig_malloc(size);
}

At compile time by macro-redefinition of malloc:

/*
 * echo '#include "hook.c"' | cat - program.c > hooked.c
 * gcc -o program hooked.c
 * ./program
 */

#include <stdio.h>
#include <stdlib.h>

void *hook_malloc(ssize_t size) {
  if (size > 1024) {
    fprintf(stderr, "Whoa, %ld is a lot of bytes... I don't think I'm up for that.\n", size);
    return NULL;
  }
  return malloc(size);
}

#define malloc(n) hook_malloc(n)

At link time using the linker’s --wrap feature, which rewrites malloc to __wrap_malloc and __real_malloc to malloc:

/*
 * gcc -Wl,--wrap,malloc -o program hook.c program.c
 * ./program
 */

#include <stdio.h>
#include <stdlib.h>

typeof(malloc) __real_malloc;

void *__wrap_malloc(ssize_t size) {
  if (size > 1024) {
    fprintf(stderr, "Whoa, %ld is a lot of bytes... I don't think I'm up for that.\n", size);
    return NULL;
  }
  return __real_malloc(size);
}

2012-03-17 Heroku buildpack for CHICKEN Scheme

Last week I put together a Buildpack for deploying CHICKEN Scheme apps on Heroku’s Cedar stack.

It comes with CHICKEN 4.7.0 and uses the egg packaging infrastructure to manage dependencies.

The code is available at Github; instructions and examples can be found in the README.

2012-02-28 Multivariable negativity test

To test multiple signed values for negativity all at once, one can use…

if ((a|b|c) < 0)

… Which might be marginally more clear than writing out each comparison individually, kind of, maybe?

This also works (i.e. “reads logically”) with &.

2012-02-24 My favorite line of Ruby

… that I’ve ever written:

begin return yield ensure unlock end if block and lock

Totally nonsensical and perfectly valid, like a Ruby version of Perl line noise. For context, it’s from a tiny lockfile class.

class Lockfile
  def initialize(file)
    @file = file
  end
  def locked?
    File.exists?(@file)
  end
  def lock(&block)
    begin return yield ensure unlock end if block and lock
    not locked? and File.open(@file, "w") { true } rescue false
  end
  def unlock
    File.delete(@file) and true rescue false
  end
end

2012-02-20 Swapping values without a temporary variable

a ^= b ^= a ^= b;

Note that, because C’s evaluation order is undefined, the result of this expression is also technically undefined (though it can be rewritten as three equivalent statements with well-defined results).

2012-02-19 First impressions of Clojure

Some stream-of-consciousness thoughts after a little while using Clojure.

Firstly, it’s far more frustrating to know that a function exists but have no idea what it’s called or where it’s located than to have no idea what you’re looking for in the first place. As a Schemer, I’m constantly wanting to type member, find, fold, &c, but finding the equivalents in the Clojure documentation takes more time than I’d like (i.e. greater than 2-3 seconds). This isn’t Clojure’s fault by any means, just irritating. clojure.repl’s apropos helps with this, but from what I can tell it doesn’t reach outside the current namespace to find functions.

The most surface-level observation is the heavy use of reader macros. There’s far too much syntax for me; when reading (what I assume to be) idiomatic Clojure in the standard library, I often feel as though I’m drowning in it. Some forms are admittedly quite useful (the literal map syntax is one I like), but some (e.g. #(), which is basically just cut) seem unnecessary and detrimental to readability.

Speaking of readability, I find the literal sequential syntax ([]) very hard to work around. Admittedly, I am of the One True Parenthese (or is that Two?) in that I find square brackets a hindrance even when they’re equivalent to parentheses, but with the difference in behavior that Clojure introduces, keeping them all wrangled becomes important and thus annoying. I also find them visually difficult to distinguish – were I to write lots of Clojure, I’d probably look for a font with significant differences between the two (sigh, four) characters.

The uniform interface accross the built-in sequence types is very nice. That said, nine times out of ten it’s a list; this either begs the question of the other types’ necessity, or it means I’m not yet using Clojure like it’s meant to be used. The ability to apply datatypes as procedures is nice, and leads to some interesting idioms. I’d be interested to know something of the implementation details of this feature, and what kinds of performance costs are involved.

The metadata stuff is all a bit weird, like property lists on crack. I’m having trouble coming up with any good uses for it that I wouldn’t rather do another way, but I’m having trouble coming up with uses for it at all so maybe time will tell if it’s a useful feature. It also seems like the kind of thing that will be abused over time, like hash arguments in Ruby; it will be interesting to see if the Clojure community reaches some consensus regarding the feature and finds a good role for it (or maybe there already is one, I don’t know it).

I haven’t had a reason to use the any of the concurrency features yet, but they look well thought-out.

Shame about the unhygienic macros.

The Java interop is probably as good as can be expected. The stack traces are useless, but the ability to develop incrementally at a REPL offsets this. The most frustrating issue in this territory is that object methods aren’t first class, so otherwise concise code becomes necessarily more complex as it slows down and passes around Java objects to circumvent this limitation.

Even with the decent integration, however, the dependence on the JVM is really hard to swallow. It’s big and slow to start, and I simply don’t want a JDK on my system. Maybe I’ll go check out Arc – at least I already have Racket installed.

2012-02-04 Pointer tagging

Pointer tagging is a very old but still very clever technique for packing more information into a pointer than just a destination register address.

It works like this: Pointers to heap data are often 4- or 8-byte-aligned, which means that the lowest couple of bits in any given pointer will be 0s. How many bits you’re guaranteed to have depends on your object layout and architecture; for allocated data on an x86-64 machine it’s three. So, instead of letting those low bits sit around unused, you can encode some information about the data pointed to directly in the pointer. Many programming languages add the type of the referenced object, for example, in order to branch earlier on its type or save space in its representation. So, if you originally had an object layout like this:

struct object {
  unsigned type;
  unsigned size;
  void *data;
};

… You could do away with the type slot for seven of your object types (hopefully the seven most common) by moving the type value into the pointers to those objects. Then, just check those first bits and mask them out before making the jump.

#define TYPE_ONE 1
#define TYPE_TWO 2

#define TYPE(p)  ((int)(p) &  7)
#define VALUE(p) ((int)(p) & ~7)

switch(TYPE(p)) {
  case TYPE_ONE:
    handle_type_one(VALUE(p));
    break;
  case TYPE_TWO:
    handle_type_two(VALUE(p));
    break;
  ...
}

This can be taken a step further. If we left-shift integer values and mark their types in those newly-freed low bits as well, we can quickly check for an integer-type object and, when one is found, perform arithmetic directly on the bits of the pointer after right-shifting its value back into place. This allows us to treat numerical objects just like any other while getting the speed boost of nearly-immediate integer operations.

#define IS_INTEGER(p) (((int)(p) & 7) == TYPE_INTEGER)

if(IS_INTEGER(p))
  value = (p >> 3);

There’s an obvious choice to be made here – defining integers by tag 000 will result in slightly faster numerical code, while defining tag 000 to be a standard object pointer will give you fewer bitshifts when tracking down objects on the heap; it all depends on the use case you want to optimize.

Since learning this trick, I’ve found that just about every programming language uses it to some degree. To my knowledge, all of SML, Smalltalk, Haskell and Ruby (as well as a dozen or so Lisp and Scheme implementations) have used it, or something like it, at one time or another.

There’s a nice, easy-reading paper discussing this technique and many more like it here.

2012-01-11 Allocating pathname buffers

From what I can tell, there’s no strictly-portable way to determine a safe length for a pathname buffer, and the various nonportable APIs for doing so are a clusterfuck.

ISO C defines a macro, FILENAME_MAX, that seems pretty reasonable at first glance:

FILENAME_MAX expands to an integer constant expression that is the size needed for an array of char large enough to hold the longest file name string that the implementation guarantees can be opened;

However, it also includes this discouraging footnote:

If the implementation imposes no practical limit on the length of file name strings, the value of FILENAME_MAX should instead be the recommended size of an array intended to hold a file name string. Of course, file name string contents are subject to other system-specific constraints; therefore all possible strings of length FILENAME_MAX cannot be expected to be opened successfully.

This makes FILENAME_MAX sound a bit unreliable. And that’s exactly the case with glibc, according to its manual:

Unlike PATH_MAX, this macro is defined even if there is no actual limit imposed. In such a case, its value is typically a very large number. This is always the case on the GNU system.

Usage Note: Don’t use FILENAME_MAX as the size of an array in which to store a file name! You can’t possibly make an array that big! Use dynamic allocation (see section Memory Allocation) instead.

So it seems FILENAME_MAX shouldn’t be used, especially if GNU/Linux is a target.

There’s PATH_MAX, but it’s not even guaranteed to be defined, and may be indeterminate when queried via pathconf(3). Moreover, the POSIX standard warns:

Some values, such as PATH_MAX, are sometimes so large that they must not be used to, say, allocate arrays.

So, strictly speaking, PATH_MAX has the same problem as FILENAME_MAX.

Then there’s MAXPATHLEN (which AFAICT isn’t guaranteed to be defined or have any bearing on reality) and _POSIX_PATH_MAX (which is just the minimum POSIX-allowed limit), neither of which help much, either.

This is frustrating, although the difficulty of retrieving a precise value makes some sense given that different filesystems impose different limits on pathname lengths (which entails a runtime check). It seems that in practice most software simply uses PATH_MAX and provides a default value of 1024 or 4096 on platforms where it’s undefined (the only example of which I’m away being GNU Hurd). When in Rome.

2011-11-15 On Cons and Icons

When discussing the merits of Lisp, one of the first things people tend to mention is macros. And it’s true, macros are a powerful feature, so it’s easy to stop there and leave happy. However, what’s really important about Lisp isn’t its macro expansion phase – many other languages provide something similar – but its simplicity and consistency. The saying is that Lisp’s code is data and its data is code¹; a powerful macro system is a direct consequence of that coequality and just one of many examples of the power it affords. It’s telling that John McCarthy’s first question, when asked about potential similarities between Ruby and Lisp, wasn’t “does Ruby have macros?” or even “does Ruby have first-class functions?”, but rather “does Ruby use list structures as data?”, referring to the double-duty performed by lists in his list processing language.

Consider the following, from Darius Bacon’s A Hacker’s Introduction to Partial Evaluation.

(define (emit-square x) `(* ,x ,x))

(define (emit-power x n)
  (cond ((= n 0) 1)
        ((= n 1) x)
        ((odd? n) `(* ,x ,(emit-power x (- n 1))))
        (else (emit-square (emit-power x (/ n 2))))))

; (emit-power 'x 5) => (* x (* (* x x) (* x x)))

What we have here isn’t just a macro but a compiler, one that generates exponentiation expressions. And – because Lisp’s fundamental compound data type is precisely that of its syntax tree – it’s ~~turtles~~ Lisp all the way down. If “it is better to have 100 functions operate on one data structure than 10 functions on 10 data structures” and you’re going to design around just one, cons cells are a good choice.

While this technique of compilation by emitting a new syntax tree isn’t groundbreaking (the original C++ compiler emitted C, for example), the ease with which it can be done might be². “Languages differ not so much in what they make possible, but in what they make easy”; Lisp makes self-reference easy, and because the referents, lists and symbols, are such versatile objects, it’s possible to encode a wide variety of ideas in a simple, common tongue.

A wonderful example of such an encoding is given in Shriram Krishnamurthi’s Automata via Macros, where he uses lists to describe a machine accepting the language c[ad]*r and a program to run it:

(define machine
  '((init (c more))
    (more (a more)
          (d more)
          (r end))
    (end)))

(define (run machine init-state stream)
  (define (walker state stream)
    (cond
      ((null? stream) #t)
      (else
       (let ((in (car stream))
             (transitions (cdr (assv state machine))))
         (let ((new-state (assv in transitions)))
           (if new-state
               (walker (first (cdr new-state)) (cdr stream))
               #f))))))
  (walker init-state stream))

; (run machine 'init '(c a d a d d r)) => #t
; (run machine 'init '(c a d a d d r r)) => #f

With Lisp one gets a flexible DSL language entirely for free, all at runtime and no macros required. Here, it’s a language for describing state machines, but similar techniques have been used to great effect for everything from defining regular expressions to building XML to querying SQL databases. Even when creating sublanguages, strong axioms remove the need for string-munging and ad-hoc parsers: one can simply (write (eval (read))).

It’s interesting to note that Lisp’s shared representation of code and data is somewhat of an historical accident. As originally conceived, the language would make use of S-expressions internally, but manually-prepared programs would instead use a notation similar to that of FORTRAN (referred to by McCarthy as “M-expressions”, for “meta-language”). Translation between the two grammars was planned but never implemented, however, as programmers took a liking to the simplicity of Lisp’s parenthetical syntax³.

It was a fortunate lapse. The flexibility of S-expressions have helped Lisp develop a culture of linguistic experimentation, one from which a powerful macro system was a natural but significant development; the ability to absorb and express a wide variety of paradigms has inspired Lisp’s description as a “language laboratory”. See for example Lisp’s switch statement, adapted from Steele & Gabriel’s The Evolution of Lisp, wherein Steele recalls using it to implement some ten interpreters a week:

(defmacro switch (value &rest body)
  (let* ((newbody (mapcar #'(lambda (clause)
                              `(,(gensym) ,@(cdr clause)))
                          body))
         (switcher (mapcar #'(lambda (clause newclause)
                               `(,(cadr clause) (go ,(car newclause))))
                           body newbody)))
    `(block switch
       (tagbody (case ,value ,@switcher)
                (break)
                ,@(apply #'nconc newbody)))))

(defmacro break ()
  '(return-from switch))

(switch n
  (case 0 (princ "none") (break))
  (case 1 (princ "one "))
  (case 2 (princ "too "))
  (case 3 (princ "many")))

In effect, to write code in a simply-homoiconic language is to do two things at once: to create a computer program and to create a series of data structures. That one can manipulate the other is the magic of it all. As a final illustration, here’s a Scheme program implementing a key-value data store that rewrites itself to include newly-inserted values (a literal interpretation of the phrase “self-modifying code”):

(define data '()) ; Initial data.

(let ((self (car (command-line)))
      (args (cdr (command-line))))
  (case (string->symbol (car args))
    ((get)
     (let ((pair (assoc (cadr args) data)))
       (when pair ; Print associated value.
         (display (cdr pair))
         (newline))))
    ((set)
     (let ((program ; Read own program.
            (with-input-from-file self
              (lambda () (read) (read)))))
       (with-output-to-file self
         (lambda ()
           (write ; Insert new value.
             `(define data
                '(,(cons (cadr args) (caddr args))
                  ,@data)))
           (write program)))))))

$ scheme data.scm get a
$ scheme data.scm set a 1
$ scheme data.scm set b 2
$ scheme data.scm get a
1
$ scheme data.scm get b
2

At this point, my Scheme bias is probably showing. If you have any examples of effective (or even just amusing) uses of homoiconicity in Scheme, Lisp or any other language, please send them my way.

The term for this is homoiconicity, but the property itself isn’t terribly useful without Lisp’s iconic simplicity. If one felt pedantic, one could argue that, “well, programs are generally written as sequences of characters, so any language that can operate on strings is fundamentally homoiconic as well”. That’s technically true, but don’t stop there: strings can be encoded as lists of numbers, so any language that can operate on numerical expressions is the same. Then, you can Gödel-encode those expressions and claim that any Turing-complete language is homoiconic. This misses the point in classic fashion – the point is that Lisp’s simplicity makes the manipulations above not just possible but trivial.↩︎
There are some modern languages, such as Io and REBOL, that do very interesting things involving introspection and self-modification. Lisp, however, predates each, and has certainly served as the foundation for such new ideas.↩︎
There have actually been many attempts to remove the parentheses, most of which have floundered. Notable as fairly direct Lisp descendants without its S-expressive syntax are Logo and Dylan.↩︎

2011-10-01 Dynamically modifying methods in Ruby

(Or, Navigating Nil Without the Tap Dancing).

I recently wanted to limit the range of return values for a few of Ruby’s built-in Array methods. Luckily, the language provides a set of (somewhat obscure) methods to achieve just this.

Aside: Method Chaining in Ruby

Array’s “bang” methods (compact!, uniq!, etc.), which modify an array in place, all return nil when no changes are made to the array. This is occasionally annoying, as it causes the following to raise exception if any method fails to mutate its callee:

[a, r, r, a, y].compact!.uniq!.each { |e| puts e }

What we’d like is for these methods to return self unconditionally. Others have come up with solutions to this problem (like this one, using Object#tap), but if you don’t like the methods the way they are (and you don’t need to maintain the default behavior for anyone else), why not just change them?

Back to Modifying Methods

In order to do this, we could override each method one-by-one, but that’s a pain. Instead, we need a way to append a small amount of code to each of these methods without losing their original definitions in the process:

class DeviantArray < Array
  Array.instance_methods(false).grep(/!/).each do |m|
    orig = instance_method(m)
    define_method(m) do |*args|
      orig.bind(self).call(*args)
      self
    end
  end
end

The first bit,

Array.instance_methods(false).grep(/!/).each do |m|

grabs the subset of Array’s instance methods with a “!” in the name. Inside the loop, things get a little less clear:

orig = instance_method(m)

Module#instance_method, according to ruby-doc, “Returns an UnboundMethod representing the given instance method.” More plainly, it snaps off a Module’s instance method, disassociating it from its object and letting us pass it around without having to refer to it by name. This is useful, since it allows us to hold onto the contents of a method even after redefining a new one under its old name.

The next block performs this redefinition:

define_method(m) do |*args|
  orig.bind(self).call(*args)
  self
end

define_method does just that. The given block is the new method content; inside, our detached method is rebound to the calling object, invoked, and self is returned.

Now, we can do the following without worrying about nil values:

DeviantArray.new([a, r, r, a, y]).flatten!.compact!.reverse!.shuffle!.uniq!.sort!

A Simpler Example

This technique isn’t only useful for changing return values – one might also use it to add a hook to a set of methods. Here’s a trivial example:

class String
  [:chomp, :delete, :gsub, :squeeze, :strip, :sub].each do |m|
    orig = instance_method(m)
    define_method(m) do |*args|
      before = size
      after = orig.bind(self).call(*args)
      puts "Removed #{before - after.size} characters."
      after
    end
  end
end

Now, we get some useful feedback when we call the following:

>> "aabbcc".squeeze
Removed 3 characters.
=> "abc"

>> " abc ".strip
Removed 2 characters.
=> "abc"

>> "abcxyz".delete "xyz"
Removed 3 characters.
=> "abc"

You probably shouldn’t do this kind of thing in a large codebase with multiple authors, but if you have the freedom to tinker this can be a powerful tool.

2011-10-01 glut without glutmainloop

Want to use GLUT for window creation but don’t want to hand over your whole main loop? Here are three options:

Good: Use GLFW instead.

Let’s face it, GLUT isn’t great. It’s old, clumsy, and any library that demands irreversible control of your application is suspect anyway. GLFW is a much better modern alternative.
Bad: Use freeglut instead.

The open-source freeglut library includes more flexible event processing functions than the standard implementation, including glutMainLoopEvent, which allows you to call GLUT’s render loop on your own terms while retaining control of main.
Ugly: Apply this hack.

But don’t. Really, just use GLFW.

2011-10-01 Monadic meditations

Half-baked monad¹ tutorials are currently in vogue, so here’s mine. It’s a retracing of how I began to grok the fullness rather than a proper tutorial, but it starts simple and builds slowly so some may find it helpful.

It isn’t difficult to understand monads on a practical level given the right explanation. For me, that was the following:

Monads provide a functional way to build a specialized evaluation environment, where some implicit behavior is threaded through a series of other computations.

That’s still somewhat abstract, so let’s look at a contrived example. Imagine the series of actions necessary to mail a postcard. A program to perform these actions in an imperative style might look like this:

(define (send postcard)
  (set! postcard (write-message postcard))
  (set! postcard (address postcard))
  (set! postcard (stamp postcard))
  (mail postcard))

Now, because each step modifies the state of the given postcard, order of operations comes into play. In an eager language, this program behaves as expected, since it’s evaluated from top to bottom. However, in a lazy language like Haskell where there’s no such guarantee of evaluation order, the postcard might be mailed before it’s written, addressed, or stamped. In such a case, one must explicitly sequence the operations in order to ensure the postcard is sent correctly.

Finding Your Identity

To do this, one could rewrite the program without mutation as a series of lets, as follows:

(define (send postcard)
  (let ((written (write-message postcard)))
    (let ((addressed (address written)))
      (let ((stamped (stamp addressed)))
        (mail stamped)))))

Or, written more succinctly as a let*:

(define (send postcard)
  (let* ((written   (write-message postcard))
         (addressed (address written))
         (stamped   (stamp addressed)))
    (mail stamped)))

And you know what? let* is a lot like a monad². Specifically, it’s very similar to the simplest monad, the identity monad. To see this, first take a look at the two functions that comprise the identity monad:

(define (bind value function)
  (function value))

(define (result value) value)

Any monad can be thought of as a chain of two functions, bind and result. In the case of the identity monad, these are trivial: bind applies a function to a value, and result simply returns its argument. How do these simple operations combine to do anything useful? Well, with the identity monad’s two procedures, one can bind a value to a name in the following way:

(bind 2 (lambda (x) (result (+ 1 x))))

The above is equivalent to:

((lambda (x) (+ 1 x)) 2)

Which itself is equivalent to:

(let ((x 2))
  (+ x 1))

Indeed, a simple Lisp implementation might define let in precisely this way, as an inversion of the identity monad’s operations. In fact, using these basic building blocks, a monadic let* of arbitrarily many terms can be expressed by a macro like the following:

(define-syntax monad-let*
  (syntax-rules ()
    ((_ () <body> ...)
     (result (begin <body> ...)))
    ((_ ((<term1> <expr1>) (<term2> <expr2>) ...) <body> ...)
     (bind <expr1>
           (lambda (<term1>)
             (monad-let* ((<term2> <expr2>) ...)
               <body> ...))))))

Given this form, the following expansion occurs:

(monad-let* ((a 1)
             (b 2)
             (c (+ a b)))
  (+ a b c))

… Into…

(bind 1 (lambda (a)
          (bind 2 (lambda (b)
                    (bind (+ a b) (lambda (c)
                                    (result (begin (+ a b c)))))))))

… Which correctly evaluates to 6. So, our monad-let* form binds names to computation results, one by one, before finally returning a single value calculated with those names. This is the evaluation environment defined by the identity monad – one of explicit ordering.

If these manipulations seem simple, that’s because, well, they are. But defining such operations as bind and result at a basic level allows for the reliable composition of more complex evaluation environments.

A State of Enlightenment

Returning to our postcard example, we now have the following:

(define (send postcard)
  (monad-let* ((written   (write-message postcard))
               (addressed (address-postcard written))
               (stamped   (stamp addressed)))
    (mail stamped)))

You’ll notice that all of these functions act very much in the same way: take a postcard, return a modified postcard. Each just moves the object it’s given from one state to another. Recognizing this, one might like to specify an evaluation environment where these state changes are implicit and the syntactic burden of passing state from one function to the next lies elsewhere. The state monad, whose bind and result operations are defined below, does just this.

(define (bind value function)
  (lambda (state)
    (let-values (((value* state*) (value state)))
      ((function value*) state*))))

(define (result value)
  (lambda (state)
    (values value state)))

This monad’s operations return unary procedures. You can destructure them simply enough; essentially, bind’s result takes a state and applies bind’s original arguments in turn as functions, while result just curries away its argument, which its resulting procedure returns along with a given state. What’s important to note is that, when these operations are chained in a monadic composition (such as our monad-let* form), there is now a new value, a state, being threaded through it all.

This can be seen with another simple call to monad-let*, this time using the state monad’s bind and result operations. Instead of a value, a procedure is now returned that, when applied to an initial value, threads it through the sequence of computations implicitly³.

(define add-two
  (monad-let* ((a (result 0))
               (b (result 2))
               (value (result (+ a b))))
    value))

(add-two 5)     ; => 2, 5
(add-two "ten") ; => 2, "ten"

Hm, clearly the initial value is just being passed along and returned. In order to actually manipulate the state, we need two simple functions, get and put (along with a monadic add to make things clearer):

(define (get)
  (lambda (state)
    (values state state)))

(define (put value)
  (lambda (state)
    (values value value)))

(define (add . lst)
  (result (apply + lst)))

Now, add-two actually does:

(define add-two
  (monad-let* ((init   (get))
               (value  (add init 2))
               (result (put value)))
    init))

(add-two 5) ; => 5, 7

A redefinition of our postcard-sending routine using the state monad might look something like the following, where each function is defined to act on a state and return a monadic value:

(define send
  (monad-let* ((_      (write))
               (_      (address))
               (_      (stamp))
               (result (mail)))
    result))

(send postcard) ; => success!

A more real-world example of this pattern might be I/O. In fact, Haskell’s I/O monad is a special case of the state monad, where the state in question is an open file. The abstractions are similar:

(define (open)
  (lambda (state)
    (let ((new (open-input-file state)))
      (values new new))))

(define (close)
  (lambda (state)
    (values (void) (close-input-port state))))

(define (rewind)
  (lambda (state)
    (set! (file-position state) 0)
    (values (void) state)))

(define (get-line)
  (lambda (state)
    (let ((line (read-line state)))
      (if (eof-object? line)
          (values "Mu." state)
          (values line state)))))

(define achieve-enlightenment
  (monad-let* ((_       (open))
               (before  (get-line))
               (_       (rewind))
               (after   (get-line))
               (then?   (get-line))
               (_       (close)))
    (print (format "Before enlightenment? ~s" before))
    (print (format "After enlightenment? ~s" after))
    (print (format "And then? ~s" then?))))

(achieve-enlightenment "zen.txt")

; Before enlightenment? "Chop wood, carry water."
; After enlightenment? "Chop wood, carry water."
; And then? "Mu."

Why? Why Not?

Hopefully by now you see what I mean by an evaluation environment. In each of these examples, some implicit behavior accompanied the computation through its steps, creating a mini-language that helped to meet a specific requirement. And, by their nature, monads can be composed and combined, allowing for powerful pipelined expressions.

So, while monads might seem to layer a silly amount of abstraction on such trivial examples, and while it is true that monads tend to make more sense in lazy languages (when a specific evaluation order is desired), pure languages (when an imperative style is needed), strongly-typed languages (where their correctness can be strictly verified), or languages where similar abstractions aren’t provided (a Lisper, for instance, might use macros or parameter objects where state is concerned), the pattern they provide might be useful from time to time. And besides, learning new solutions (even to old problems) is always a good thing.

“Half-baked monads” would be a good band name.↩︎
This isn’t really true. This article skips a lot of technicalities such as type requirements and associativity rules. You can find these elsewhere.↩︎
result must be used here to provide monadic values from the primitives 0 and 2. A more specialized macro than monad-let* would alleviate the need.↩︎

2011-10-01 Anonymous functions in C99

This post on comp.lang.c describes a preprocessor macro to add convenient support for anonymous functions to C:

#define function(return_type, body_and_args) \
  ({ \
    return_type __fn__ body_and_args \
    __fn__; \
  })

While not technically a lambda (and requiring GCC extensions), it does allow for some fun constructs. One can use it to express forms from other languages that naturally take inline functions, such as Lisp’s with-open-file or Ruby’s each_line:

void with_open_file(const char *filename, void(* fn)(FILE *)) {
  FILE *f;
  if ((f = fopen(filename, "r")) == NULL) {
    perror("Cannot open source file");
    exit(1);
  }
  (* fn)(f);
  fclose(f);
}

void each_line(FILE *f, void(* fn)(char *)) {
  char l[8192];
  while (fgets(l, sizeof(l), f))
    (* fn)(l);
}

int main(int argc, char **argv) {
  with_open_file(argv[1], function(void, (FILE *f) {
    int i = 0;
    each_line(f, function(void, (char *l) {
      printf("%i %s", ++i, l);
    }));
  }));
  return 0;
}

It’s easy to forget some of the cool things C macros can do. As an example, the V7 Unix shell (what would eventually become bash) hardly looks like C in some places.