I won’t be porting the following eggs, which are obsolete and unsupported: bitcoin, glfw, and yahoo-finance.
I’ll probably also port combinator-birds for fun, but it won’t be released to the egg servers.
2018-02-15 Improved Scheme language support in Vim
The next release of Vim, version 8.1, will include the runtime files from my vim-scheme plugin, and I’ll be maintaining Vim’s built-in Scheme support from now on.
Development will still happen in the linked repository, so you can continue to use the plugin if you want to stay as up-to-date as possible.
If you run into any problems editing Scheme with stock Vim, however, please don’t hesitate to contact me. Patches are also very welcome. In particular, I’d like to extend the keyword highlighting to support dialects other than CHICKEN (which is already pretty well-covered).
Like most people, my email setup is a Byzantine horror of shell scripts and Unix abuse. In an effort to further complicate things, I recently wrote a program that monitors a mail spool and runs a command whenever it receives new messages. Here it is in its entirety. It’s very simple and probably not of general use, but I figured I’d share it anyway since it turned out to be a nice little example of how to use kqueue(2).
I’m running it like so, where sortmail does some tagging and shuffles messages into the correct folders:
$ mm "$HOME/bin/sortmail"
The kqueue(2) API is pleasant enough, and strikes me as hitting just the right level of abstraction for its intended use.
A trick Scheme programmers use to make their scripts directly executable is to start them with a block comment and a line that immediately executes a Scheme interpreter:
Creating a file when it doesn’t already exist is surprisingly hard to do with Ansible.
The file module has a state=file mode, but it doesn’t create files. Obviously.
It also has a state=touch mode, but that will unconditionally modify the file’s timestamp and register a “change” event, even when the file already exists.
This only works because variables are (apparently) defined before a decision is made about whether something has changed, but that’s good enough for me. There are, of course, other ways one might accomplish this, but I was after a single task and wanted to avoid the command and shell modules, if possible.
I don’t know why the file module’s state=file option doesn’t create files. Ansible’s docs suggest that one “see the copy or template module if you want that behavior”, but introducing a template just to create an empty file seems pretty naff to me.
I use Vultr for server hosting, and while they have support for automatically installing Linux and Windows there’s currently no way to deploy OpenBSD from a release ISO without logging in to a VNC console and running the installer manually.
That said, it’s pretty easy to build a custom OpenBSD disk image that will self-install when booted. IANAOBSDD, but the project’s build scripts are very readable and easy to modify.
Put an auto_install.conf file in the disk’s root. This will trigger automatic installation when the disk is booted.
Optionally put a disklabel.conf file in the disk’s root. This will control how the disk is partitioned.
Optionally put an etc.rc.firsttime script in the disk’s root. This will be appended to /etc/rc.firsttime in the installed system and run on first boot.
Add these files to the install manifest for your architecture.
Build a new CD image.
The files I used for steps 1-3 look roughly like this:
$ cat /usr/src/distrib/miniroot/auto_install.conf
System hostname = wunderbox
Password for root account = turtle
Do you expect to run the X Window System = no
Setup a user = evhan
Password for user = turtle
Use (W)hole disk = W
URL to autopartitioning template = file:/disklabel.conf
Location of sets = http
Set name(s) = -game*
Set name(s) = -x*
What timezone are you in = UTC
$ cat /usr/src/distrib/miniroot/disklabel.conf
/ 512M-1G 8%
swap 1G
/tmp 512M-1G 8%
/var 4G-6G 30%
/usr 2G-6G 20%
/usr/local 2G-10G 20%
/home 1G-* 10%
$ cat /usr/src/distrib/miniroot/etc.rc.firsttime
echo 'ssh-rsa ...' >> /home/evhan/.ssh/authorized_keys
The firsttime script can do whatever you like, adjust according to taste. Also note that the disklabel template can be any URL supported by ftp(1), not just a local file.
Add these files to the build manifest for your architecture. Vultr uses amd64 boxes, so I modified distrib/amd64/common/list. I also added a single line to distrib/miniroot/install.sh so that my firsttime script runs when the new system is started. The final diff should look something like this:
$ make -C /usr/src/distrib/special
$ make -C /usr/src/distrib/amd64
$ file /usr/obj/distrib/amd64/cdfs/cd59.iso
/usr/obj/distrib/amd64/cdfs/cd59.iso: ISO 9660 CD-ROM filesystem data 'OpenBSD/amd64 5.9 boot-only C' (bootable)
Use cd59.iso (or whatever version you’ve just built) as a custom image when creating a new server and it will auto-install when booted.
One-time ISO Boot
This approach still requires that you detach the install media manually once the installation finishes. If you don’t, the server will continue to boot from cd59.iso even after the OS is present, leading to a never-ending install/reboot cycle.
As a workaround you can trigger the installation from an iPXE script, which will run just once. A Syslinux MEMDISK can be used to chain-load the ISO, for example:
#!ipxe
initrd http://example.com/cd59.iso
chain http://example.com/memdisk iso raw
Properly securing this approach is left as an exercise for the reader.
Lately I’ve been using the following trick as a quick and dirty way to handle command line arguments:
(define (command-line-data)
(map (lambda (s) (call-with-input-string s read))
(command-line-arguments)))
(define (main #!key (foo 1) (bar 'abc) (baz '()))
(printf "~s ~s ~s~n" foo bar baz))
(apply main (command-line-data))
The command-line-data procedure just converts a program’s arguments to a list of Scheme data, which isn’t too interesting by itself. The real trick is to write your main procedure so that it accepts whatever keyword arguments you want your program to accept, then invoke your program as though you were calling it from Scheme code:
The first describes the flushing behaviour of output streams.
The second specifies that fflush(NULL) will flush all streams “for which the behavior is defined above” (emphasis mine).
The third, a POSIX.1-2008 extension, describes the flushing behaviour of input streams.
As a result of that single word “above”, a strictly-conformant libc will implement fflush(NULL) so that only output streams are flushed, despite the fact that the behaviour of fflush() on input streams is perfectly well-defined in POSIX-1.2008. Indeed, that’s exactly what musl does, which is why the bug was first encountered with that C library.
It’s a shame, since omitting that one word would have resulted in a much more coherent specification.
It looks like a correction for this problem has already been proposed and accepted by the POSIX working group. The fix, which will presumably be included in the next version of the spec, simply swaps the second and third paragraphs of the function’s description.
This is a short walk through the process of packaging a dynamically-linked CHICKEN 4.x application for binary deployment.
Note that this does not apply to CHICKEN 5.x! Newer versions of CHICKEN support static compilation, which is the preferred option.
CHICKEN 4.x on the other hand doesn’t prescribe a process for this, and there are lots of ways to achieve similar results, but this is one approach that has worked well for me in the past.
But First: The Easy Option
Before I describe how to manually bundle an application, I should mention the automatic method that’s almost guaranteed to work for normal programs:
Compile a program and place it, along with the CHICKEN runtime, into a directory named after the input file.
Copy all of CHICKEN’s standard libraries into the directory.
Install eggs, including their dependencies, into the directory.
The reason I’m going to explain an alternative, albeit more complicated, approach to packaging is that this simple method can potentially result in very large application bundles. This is because it bundles every file from the installed eggs that your program could possibly use at runtime, regardless of whether they’re truly needed. This can be wasteful of disk space, but if package size isn’t an issue I suggest you take the easy route.
Bundling the Runtime
If you’re still reading, you’ve picked the hard option. Good choice. As an example program we’ll be packaging a CHIP-8 emulator I wrote in the spirit, though not the deed, of itch.io’s Spring 2016 Lisp Game Jam. To start with, clone the project and fetch its dependencies:
This will download and install the two eggs this program uses into CHICKEN’s extension repository. You can verify that they’ve been installed with chicken-status, and you can see what files they’ve included with chicken-status -f begin-syntax ncurses r7rs.
The commands to actually build the project look like this:
Don’t worry too much about the specifics here, just know that these two commands compile the program, and the second is the one that actually produces an executable. If you run the resulting file, it should print a usage message:
We now have a binary that works fine locally, but requires that the host system have CHICKEN installed. If you were to copy it onto another machine with the same architecture but without a CHICKEN runtime, it would fail with an error like the following:
$ ./chick-8
./chick-8: error while loading shared libraries: libchicken.so.8: cannot open shared object file: No such file or directory
Luckily, the “CHICKEN runtime” is just a single shared library and, as with the easy option, we can use the compiler’s “-deploy” flag to automatically bundle it with our program in a directory:
We’ve bundled the CHICKEN runtime, but we’re still missing the program’s egg dependencies. Luckily, eggs are mostly just shared libraries that can be copied into our application directory just like “libchicken.so”. The tricky part is usually figuring out exactly what files to copy, but the CHICKEN compiler has a (slightly obscure) option that can help with that.
Returning to our csc(1) invocations, if we add “-debug M” to both, we’ll see a list of “static” and “dynamic” library requirements for each file. Don’t worry about the static requirements – those are handled at compile time – but do take note of the dynamic ones:
From this we can tell that we need to copy at least1 the “ncurses”, “numbers”, “scheme.base”, and “r7rs” libraries from the extension repository into our program directory. We could do this manually for each object:
$ cp "$(chicken-install -repository)/r7rs.so" chick-8
[... and so on ...]
But it’s easier to just use chicken-status(1) to list the files in the repository and filter for the ones we need:
With that, we have a freely-relocatable application bundle that that requires nothing of its host but a working C library2.
Packaging the Result
With this self-contained version of our program, we’re not too far from a package in a format such as deb(5) or rpm(8).
Most OS-specific package formats boil down to a handful of archive files, usually compressed, containing an application’s files and a bunch of metadata. Learning the vagaries of any one of these formats is quite a task, let alone trying to support many of them at once. Thankfully, there are tools that can help.
One such tool that I like is fpm. That link has the details, but for our purposes it’s enough to know that we can hand fpm a directory and it will give us a package in return.
Let’s build that directory. We’ll call it “package”, copy our bundled files inside, and link to the binary itself from /usr/bin:
Now we just pass our “package” directory off to fpm. In this example I’m building a Debian package, but you can control the output format with the -t argument:
As you can see, fpm has added a changelog and filled in some default metadata for us. You’ll typically want to tweak this metadata to reflect reality, which can be done with the program’s many command-line flags (see fpm --help).
While a package built this way is totally unsuitable for inclusion in a distro’s official repositories – I’ve no idea how many RPM packaging guidelines something like this breaks, for example, but I’m pretty sure it’s not none – it’s a fine way to distribute packages within your own infrastructure, or to end users who just want an easy way to install your software.
One major problem that this article totally glosses over is that of transitive dependencies: if an egg your program uses depends on another library at runtime, you’ll need to copy that one too.↩︎
That’s a lie, this example also requires a working ncurses library.↩︎
It runs a single Vim command on standard input, then quits.
An example use case from the last few days: selecting the range of lines between “foo” and “bar”, not including either. I know I can use address offsets in Vim for this, so I do: vimdo '/foo/+1, /bar/-1 p' < file. There may be a similar feature in sed(1) and there’s almost certainly a way to do it in awk(1), but reusing the command language with which one spends hours a day makes sense to me.
2015-04-07 Finding a pathname of the current executable
type Message interface {}
type Become struct {
f func()
}
func universalServer() chan Message {
c := make(chan Message)
go func() {
switch v := <- c; v.(type) {
case Become:
close(c)
v.(Become).f()
}
}()
return c
}
concurrent = require 'concurrent'
-- Expects messages of the form { t = "become", f = function () ... end }
universal_server = function ()
return concurrent.spawn(function ()
local msg = concurrent.receive()
if msg.t == "become" then
msg.f()
else
-- drop
end
end)
end
case class Become(f : () => _)
class UniversalServer extends Actor {
def receive = {
case Become(f) => f()
}
}
This is slightly off, since it isn’t networked (but you can just pretend c is a net.Read sink), and universalServer receives from c unconditionally. The channel should really be buffered, with universalServer ignoring non-BecomeMessages.↩︎
Again, the server should ignore messages where msg.t != "become".↩︎
#!/bin/sh
fold -s \
| vim -R - \
-c "sil run macros/less.vim" \
-c "sil set ft=mail" \
-c "sil set shm+=T"
Safely bogofilter an online mail spool:
#!/bin/ksh
set -o monitor
if temp=$(mktemp -q)
then
/usr/libexec/lockspool |&
if read -p ok && [ "$ok" = 1 ]
then
bogofilter -pM < "$MAIL" > "$temp"
if [ $? -lt 3 ]
then cat < "$temp" > "$MAIL"
else error=1
fi
fi
rm -f "$temp"
exec 3>&p 3>&-
exit $error
fi
s3cmd(1) supports automatic encryption & decryption of data, but it defaults to using a symmetric cypher and passphrase. Happily, it’s straightforward to configure it to use a public key instead, for those who are so inclined.
Given a block of memory, you can probably express its size as a value of type size_t. This isn’t actually guaranteed – size_t is only required to hold values 65535 or greater, given by SIZE_MAX – but it’s pretty safe to assume since AFAIK most everybody defines size_t according to the machine’s word length.
However, given two pointers, it’s not nearly as certain that you can express their difference with a ptrdiff_t. Because ptrdiff_t is signed (in order to indicate the direction of the difference), its maximum value is SIZE_MAX / 2 (actually PTRDIFF_MAX, also from stdint.h, although the two values should be equal), with undefined behavior if the result of the subtraction doesn’t fit in its range. This strikes me as a far more likely limitation to run up against.
2013-07-16 Compiling statically-linked CHICKEN Scheme programs with extensions
While static compilation with extensions isn’t officially supported by the CHICKEN 4.x toolchain (chicken-install(1) et al.), it’s usually possible. However, doing so means compiling the extension and all of its dependencies yourself.
A simple example of this follows. The general strategy is to:
Fetch each extension’s source.
Compile each extension into units and import files.
Compile your program in the presence of the import files.
Link your program with the units.
All of the information here is available in more detail in the CHICKEN manual – I’ve just extracted the relevant parts and added a narrative.
Also, note that this post only applies to CHICKEN 4.x. In CHICKEN 5.0 static linking is handled automatically when you pass the “-static” flag to csc(1) and the C compiler, so none of the following contortions are required.
Our goal is to produce a statically-linked program that uses the uri-generic extension, which itself uses the matchable and defstruct extensions.
We’ll start with matchable. Fetch its source with chicken-install -retrieve.
Each extension includes several files we’re not interested in1, and then one – if we’re lucky – primary source file. For this extension, that’s matchable.scm.
To use this file statically, we compile it to an ELF object as a “unit”, then link that object into our final program.
So far, little has changed from the manual’s example of compiling multiple files. The important differences are:
The addition of the -unit and -uses flags.
The addition of the -emit-import-library flag.
The -unit and -uses flags enforce the compilation model described in the manual as CHICKEN’s basic mode of operation, despite the fact that neither matchable.scm nor program.scm declare any units as described2.
The -emit-import-library flag causes one additional file, matchable.import.scm, to be produced when compiling matchable.scm. This file contains compile-time information about the matchable module that’s used by csc when it’s imported into another program3.
Now, do the same for defstruct.
$ cat program.scm; echo
(use matchable defstruct)
(defstruct point x y)
(match (make-point x: 1 y: 2)
(($ point 1 2)
(display "Yippee!")
(newline)))
$ chicken-install -r defstruct > /dev/null
$ csc -unit defstruct -cJ defstruct/defstruct.scm -o defstruct.o
$ csc -uses matchable,defstruct -c program.scm -o program.o
$ csc -static program.o defstruct.o matchable.o -o program
$ file program
program: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=4cd22ac0bd74ba3721fbfb6c4dc8dc2eb5e4a170, not stripped
$ ./program
Yippee!
The binary now includes both matchable and defstruct4.
You can see where this is going. For each extension we use, and for each extension any of those extensions use (and so on, recursively – this is Scheme, after all), build an object with -unit and declare it as a dependency of every file that uses it with -uses.
One last time, with uri-generic (remember, that’s why we started down this rabbit hole in the first place). Here, uri-generic uses matchable and defstruct, while our program uses only uri-generic.
We now have a static binary containing all of the required extensions.
Note that, when it comes time to link our program, an object for every extension that’s used must be included in the resulting binary. This is tedious, but because they’re standard ELF objects, they can be bundled into archives, making them easier to manage in bulk.
There you have it. It’s a bit of work, but the same can be done for most extensions, and the ability to produce a self-contained program that makes use of the many eggs in CHICKEN’s coop is nice when needed.
The setup and meta files in particular are used by CHICKEN’s package manager.↩︎
In fact, the -unit and -uses flags are equivalent to inserting those declarations manually. Again from the manual:
-unit NAME
Compile this file as a library unit.
Equivalent to -prelude "(declare (unit NAME))"
-uses NAME
Use definitions from the library unit NAME.
This is equivalent to -prelude "(declare (uses NAME))".
Indeed, because CHICKEN’s module system is entirely syntactic, this file can be discarded once our program has been compiled. Import libraries are described in more detail here.↩︎
A closer look at matchable and defstruct will show that they’re both syntax-only extensions, so it’s technically unnecessary to compile them into objects at all. Because macros are included in a module’s import library, it’s enough to produce matchable.import.scm and defstruct.import.scm and have those files available (i.e. present in the current directory) when compiling program.scm. This makes them bad examples in particular, but this article gives a general approach that will work with other extensions, too.↩︎
If you really want to feel rage, try comparing some of these with ==, which is notoriously useless dangerous. For example:
> '' == 0
true
> 0 == '0'
true
> '' == '0'
false
(What, you thought equality was transitive? Don’t be silly.)
Because of things like this, I think there’d be value in a “CoffeeScript lite” that rewrites the absurd parts of JavaScript (such as replacing all instances of == with === as CoffeeScript does, or fixing instanceof so that "abc" instanceof String is true) without implementing an entirely new syntax.
One of the most useful scripts in my bin is dt. It manages named dtach(1) fifos under $HOME/.dtach.
#!/bin/sh
name=$1
dir=$HOME/.dtach
usage() {
echo "usage: $(basename "$0") [-n name] cmd"
}
if [ $# = 0 ]
then
ls -1 "$dir"
else
mkdir -p "$dir"
while getopts hn: arg; do
case $arg in
?) usage>&2; exit 1;;
n) name=$OPTARG; shift 2;;
h) usage; exit;;
esac
done
DTACH="$(date +%s)-$name" \
dtach -A "$dir/$name" -r winch "$@"
fi
Called without arguments, it lists running sessions. Called with a command line, it either resumes the session with the name of the program to be run, or starts a new one if none exists. An alternate session name can be given with -n.
I use this command a lot:
$ fc -ln 0 | wc -l
9367
$ fc -ln 0 | grep -c '^ dt[ $]'
1264
Much of the following information is out of date as of Vim 8.1, which incorporates my vim-scheme plugin and thus includes many of the changes described here without any extra configuration.
Indentation
Vim’s built-in Lisp indentation (“set lisp”) is hardcoded to shift inner forms by two spaces. I much prefer the indentation style used by the various emacsen, where identifiers not in lispwords are vertically aligned; this patch against Vim 7.3 emulates this style.
Alternatively, you can use my schematic-format utility as a source formatter. To configure it as Vim’s = operator for Scheme files, use the following command:
au FileType scheme setl equalprg=schematic-format
Lispwords
If you choose not to use an 'equalprg', getting lispwords right can ease a lot of manual-indentation pain. The most important change to make is…
au FileType scheme setl lispwords-=if
… Which vertically aligns the arms of if expressions.
Syntax
Vim colors parentheses as Delimiters by default, which is too bright for my taste in most colorschemes. Additionally, I like to have a visual indication of which parentheses are parts of quoted forms, as opposed to normal code. This patch colors “normal” parentheses as Comments (which are light grey in my colorscheme), while parentheses in quote and quasiquote forms remain colored as Delimiters (red, with inner unquotes returning parentheses to the lighter Comment color).
I also maintain reasonably complete syntax files for R7RS Scheme and CHICKEN’s extensions to the language. To use them, either install my vim-scheme plugin with your plugin manager of choice or place the following files in the corresponding directories under ~/.vim/:
I’ve written about simple REPL integration here. I still use a slight variation on that approach.
Plugins
Surround
If you use vim-surround, the following line adds a binding to the s key that surrounds forms in a function application, so that, for example, yssscons wraps the current line in (cons ...).
let g:surround_115 = "(\1function: \1 \r)"
Smart Input
lexima is the best autoclose plugin I’ve found so far. It’s based on the solid vim-smartinput, but provides the added benefit of dot-repeatable insertions.
It also works well for Scheme, with minor customization. The following lines in ~/.vimrc prevent it from autoclosing quote and quasiquote characters:
Setting keywordprg gives you access to chicken-doc’s documentation for a given identifier when K is pressed over it.
setl keywordprg=chicken-doc
Completion
Vim
The following script dumps function names from chicken-doc into files named by node under ~/.vim/completion/scheme.
#!/bin/sh -e
dir="$HOME/.vim/completion/scheme"
first() { cut -d' ' -f1; }
mkdir -p "$dir"
for n in $(chicken-doc -c chicken | first)
do
chicken-doc -c chicken "$n" | first | tee "$dir/chicken-$n"
done
for n in $(chicken-doc -c | first)
do
[ "$n" = "chicken" ] || chicken-doc -c "$n" | first | tee "$dir/$n"
done
These files can be sourced to add completion for CHICKEN identifiers by adding the following to ~/.vim/ftplugin/scheme.vim:
setl complete+=d,k~/.vim/completion/scheme/**
Obviously, the same could be done for any other language, given a way to generate its wordfile(s).
csi
The same wordfiles can be used for tab completion in csi with rlwrap.
To recursively watch a directory, one typically monitors IN_CREATE events and, if the subject is a directory, inotify_add_watches it as well.
However, this introduces the potential for what are essentially TOCTTOU bugs: by the time you’ve inotify_add_watched a newly-created directory, it may have already been the subject of events about which you’re interested. To correct for this, you have to add the watch and then immediately walk the directory to act on any files that were created before the watch was registered before returning to handle new events resulting from the watch.
Even more sinister is the way list comprehensions brutally pollute their surrounding scope:
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> for x in [1, 2, 3]:
... x + 1
...
2
3
4
>>> x
3
>>> [x + 1 for x in [4, 5, 6]]
[5, 6, 7]
>>> x
6
This has been fixed in Python 3, thankfully. For 2.x releases, you can work around it with map (for the latter case, anyway).
I’m not an Emacs user and I don’t care for slimv, but the ability to send text from $EDITOR to a running REPL a la SLIME is invaluable. For a while I used a modified version of tslime.vim, but I’ve since ditched it for something much simpler.
In sh(1):
mkfifo repl
cat > repl & $command < repl
In vim(1):
nmap <C-c><C-c> vap:w >> repl<CR>
That’s it. ^C^C sends the toplevel expression under the cursor to the $command reading from repl, no plugins or external tools are needed, and concerns are separated correctly: the editor and repl are isolated by pty and the view is managed by the window manager, as it should be.
2012-04-08 Tricksy byte to ASCII hex value conversions
short n = 10;
n["0123456789ABCDEF"]; /* => 'A' */
Too cute by half.
2012-03-18 Three ways to hook a library function in C
Say one wanted to modify malloc such that it punted on allocations larger than 1024 bytes.
At runtime, by library interposition using LD_PRELOAD. This is the most common method. With the following as hook.c:
/*
* gcc -shared -fPIC -D_GNU_SOURCE -ldl -o hook.so hook.c
* LD_PRELOAD=./hook.so ./program
*/
#include <dlfcn.h>
#include <stdio.h>
void *malloc(ssize_t size) {
typeof(malloc) *orig_malloc;
if (size > 1024) {
fprintf(stderr, "Whoa, %ld is a lot of bytes... I don't think I'm up for that.\n", size);
return NULL;
}
orig_malloc = dlsym(RTLD_NEXT, "malloc");
return orig_malloc(size);
}
At compile time by macro-redefinition of malloc:
/*
* echo '#include "hook.c"' | cat - program.c > hooked.c
* gcc -o program hooked.c
* ./program
*/
#include <stdio.h>
#include <stdlib.h>
void *hook_malloc(ssize_t size) {
if (size > 1024) {
fprintf(stderr, "Whoa, %ld is a lot of bytes... I don't think I'm up for that.\n", size);
return NULL;
}
return malloc(size);
}
#define malloc(n) hook_malloc(n)
At link time using the linker’s --wrap feature, which rewrites malloc to __wrap_malloc and __real_malloc to malloc:
/*
* gcc -Wl,--wrap,malloc -o program hook.c program.c
* ./program
*/
#include <stdio.h>
#include <stdlib.h>
typeof(malloc) __real_malloc;
void *__wrap_malloc(ssize_t size) {
if (size > 1024) {
fprintf(stderr, "Whoa, %ld is a lot of bytes... I don't think I'm up for that.\n", size);
return NULL;
}
return __real_malloc(size);
}
begin return yield ensure unlock end if block and lock
Totally nonsensical and perfectly valid, like a Ruby version of Perl line noise. For context, it’s from a tiny lockfile class.
class Lockfile
def initialize(file)
@file = file
end
def locked?
File.exists?(@file)
end
def lock(&block)
begin return yield ensure unlock end if block and lock
not locked? and File.open(@file, "w") { true } rescue false
end
def unlock
File.delete(@file) and true rescue false
end
end
2012-02-20 Swapping values without a temporary variable
a ^= b ^= a ^= b;
Note that, because C’s evaluation order is undefined, the result of this expression is also technically undefined (though it can be rewritten as three equivalent statements with well-defined results).
Some stream-of-consciousness thoughts after a little while using Clojure.
Firstly, it’s far more frustrating to know that a function exists but have no idea what it’s called or where it’s located than to have no idea what you’re looking for in the first place. As a Schemer, I’m constantly wanting to type member, find, fold, &c, but finding the equivalents in the Clojure documentation takes more time than I’d like (i.e. greater than 2-3 seconds). This isn’t Clojure’s fault by any means, just irritating. clojure.repl’s apropos helps with this, but from what I can tell it doesn’t reach outside the current namespace to find functions.
The most surface-level observation is the heavy use of reader macros. There’s far too much syntax for me; when reading (what I assume to be) idiomatic Clojure in the standard library, I often feel as though I’m drowning in it. Some forms are admittedly quite useful (the literal map syntax is one I like), but some (e.g. #(), which is basically just cut) seem unnecessary and detrimental to readability.
Speaking of readability, I find the literal sequential syntax ([]) very hard to work around. Admittedly, I am of the One True Parenthese (or is that Two?) in that I find square brackets a hindrance even when they’re equivalent to parentheses, but with the difference in behavior that Clojure introduces, keeping them all wrangled becomes important and thus annoying. I also find them visually difficult to distinguish – were I to write lots of Clojure, I’d probably look for a font with significant differences between the two (sigh, four) characters.
The uniform interface accross the built-in sequence types is very nice. That said, nine times out of ten it’s a list; this either begs the question of the other types’ necessity, or it means I’m not yet using Clojure like it’s meant to be used. The ability to apply datatypes as procedures is nice, and leads to some interesting idioms. I’d be interested to know something of the implementation details of this feature, and what kinds of performance costs are involved.
The metadata stuff is all a bit weird, like property lists on crack. I’m having trouble coming up with any good uses for it that I wouldn’t rather do another way, but I’m having trouble coming up with uses for it at all so maybe time will tell if it’s a useful feature. It also seems like the kind of thing that will be abused over time, like hash arguments in Ruby; it will be interesting to see if the Clojure community reaches some consensus regarding the feature and finds a good role for it (or maybe there already is one, I don’t know it).
I haven’t had a reason to use the any of the concurrency features yet, but they look well thought-out.
Shame about the unhygienic macros.
The Java interop is probably as good as can be expected. The stack traces are useless, but the ability to develop incrementally at a REPL offsets this. The most frustrating issue in this territory is that object methods aren’t first class, so otherwise concise code becomes necessarily more complex as it slows down and passes around Java objects to circumvent this limitation.
Even with the decent integration, however, the dependence on the JVM is really hard to swallow. It’s big and slow to start, and I simply don’t want a JDK on my system. Maybe I’ll go check out Arc – at least I already have Racket installed.
Pointer tagging is a very old but still very clever technique for packing more information into a pointer than just a destination register address.
It works like this: Pointers to heap data are often 4- or 8-byte-aligned, which means that the lowest couple of bits in any given pointer will be 0s. How many bits you’re guaranteed to have depends on your object layout and architecture; for allocated data on an x86-64 machine it’s three. So, instead of letting those low bits sit around unused, you can encode some information about the data pointed to directly in the pointer. Many programming languages add the type of the referenced object, for example, in order to branch earlier on its type or save space in its representation. So, if you originally had an object layout like this:
… You could do away with the type slot for seven of your object types (hopefully the seven most common) by moving the type value into the pointers to those objects. Then, just check those first bits and mask them out before making the jump.
This can be taken a step further. If we left-shift integer values and mark their types in those newly-freed low bits as well, we can quickly check for an integer-type object and, when one is found, perform arithmetic directly on the bits of the pointer after right-shifting its value back into place. This allows us to treat numerical objects just like any other while getting the speed boost of nearly-immediate integer operations.
There’s an obvious choice to be made here – defining integers by tag 000 will result in slightly faster numerical code, while defining tag 000 to be a standard object pointer will give you fewer bitshifts when tracking down objects on the heap; it all depends on the use case you want to optimize.
Since learning this trick, I’ve found that just about every programming language uses it to some degree. To my knowledge, all of SML, Smalltalk, Haskell and Ruby (as well as a dozen or so Lisp and Scheme implementations) have used it, or something like it, at one time or another.
There’s a nice, easy-reading paper discussing this technique and many more like it here.
From what I can tell, there’s no strictly-portable way to determine a safe length for a pathname buffer, and the various nonportable APIs for doing so are a clusterfuck.
ISO C defines a macro, FILENAME_MAX, that seems pretty reasonable at first glance:
FILENAME_MAX expands to an integer constant expression that is the size needed for an array of char large enough to hold the longest file name string that the implementation guarantees can be opened;
However, it also includes this discouraging footnote:
If the implementation imposes no practical limit on the length of file name strings, the value of FILENAME_MAX should instead be the recommended size of an array intended to hold a file name string. Of course, file name string contents are subject to other system-specific constraints; therefore all possible strings of length FILENAME_MAX cannot be expected to be opened successfully.
This makes FILENAME_MAX sound a bit unreliable. And that’s exactly the case with glibc, according to its manual:
Unlike PATH_MAX, this macro is defined even if there is no actual limit imposed. In such a case, its value is typically a very large number. This is always the case on the GNU system.
Usage Note: Don’t use FILENAME_MAX as the size of an array in which to store a file name! You can’t possibly make an array that big! Use dynamic allocation (see section Memory Allocation) instead.
So it seems FILENAME_MAX shouldn’t be used, especially if GNU/Linux is a target.
There’s PATH_MAX, but it’s not even guaranteed to be defined, and may be indeterminate when queried via pathconf(3). Moreover, the POSIX standard warns:
Some values, such as PATH_MAX, are sometimes so large that they must not be used to, say, allocate arrays.
So, strictly speaking, PATH_MAX has the same problem as FILENAME_MAX.
Then there’s MAXPATHLEN (which AFAICT isn’t guaranteed to be defined or have any bearing on reality) and _POSIX_PATH_MAX (which is just the minimum POSIX-allowed limit), neither of which help much, either.
This is frustrating, although the difficulty of retrieving a precise value makes some sense given that different filesystems impose different limits on pathname lengths (which entails a runtime check). It seems that in practice most software simply uses PATH_MAX and provides a default value of 1024 or 4096 on platforms where it’s undefined (the only example of which I’m away being GNU Hurd). When in Rome.
When discussing the merits of Lisp, one of the first things people tend to mention is macros. And it’s true, macros are a powerful feature, so it’s easy to stop there and leave happy. However, what’s really important about Lisp isn’t its macro expansion phase – many other languages provide something similar – but its simplicity and consistency. The saying is that Lisp’s code is data and its data is code1; a powerful macro system is a direct consequence of that coequality and just one of many examples of the power it affords. It’s telling that John McCarthy’s first question, when asked about potential similarities between Ruby and Lisp, wasn’t “does Ruby have macros?” or even “does Ruby have first-class functions?”, but rather “does Ruby use list structures as data?”, referring to the double-duty performed by lists in his list processing language.
(define (emit-square x) `(* ,x ,x))
(define (emit-power x n)
(cond ((= n 0) 1)
((= n 1) x)
((odd? n) `(* ,x ,(emit-power x (- n 1))))
(else (emit-square (emit-power x (/ n 2))))))
; (emit-power 'x 5) => (* x (* (* x x) (* x x)))
While this technique of compilation by emitting a new syntax tree isn’t groundbreaking (the original C++ compiler emitted C, for example), the ease with which it can be done might be2. “Languages differ not so much in what they make possible, but in what they make easy”; Lisp makes self-reference easy, and because the referents, lists and symbols, are such versatile objects, it’s possible to encode a wide variety of ideas in a simple, common tongue.
A wonderful example of such an encoding is given in Shriram Krishnamurthi’s Automata via Macros, where he uses lists to describe a machine accepting the language c[ad]*r and a program to run it:
(define machine
'((init (c more))
(more (a more)
(d more)
(r end))
(end)))
(define (run machine init-state stream)
(define (walker state stream)
(cond
((null? stream) #t)
(else
(let ((in (car stream))
(transitions (cdr (assv state machine))))
(let ((new-state (assv in transitions)))
(if new-state
(walker (first (cdr new-state)) (cdr stream))
#f))))))
(walker init-state stream))
; (run machine 'init '(c a d a d d r)) => #t
; (run machine 'init '(c a d a d d r r)) => #f
With Lisp one gets a flexible DSL language entirely for free, all at runtime and no macros required. Here, it’s a language for describing state machines, but similar techniques have been used to great effect for everything from defining regular expressions to building XML to querying SQL databases. Even when creating sublanguages, strong axioms remove the need for string-munging and ad-hoc parsers: one can simply (write (eval (read))).
It’s interesting to note that Lisp’s shared representation of code and data is somewhat of an historical accident. As originally conceived, the language would make use of S-expressions internally, but manually-prepared programs would instead use a notation similar to that of FORTRAN (referred to by McCarthy as “M-expressions”, for “meta-language”). Translation between the two grammars was planned but never implemented, however, as programmers took a liking to the simplicity of Lisp’s parenthetical syntax3.
It was a fortunate lapse. The flexibility of S-expressions have helped Lisp develop a culture of linguistic experimentation, one from which a powerful macro system was a natural but significant development; the ability to absorb and express a wide variety of paradigms has inspired Lisp’s description as a “language laboratory”. See for example Lisp’s switch statement, adapted from Steele & Gabriel’s The Evolution of Lisp, wherein Steele recalls using it to implement some ten interpreters a week:
In effect, to write code in a simply-homoiconic language is to do two things at once: to create a computer program and to create a series of data structures. That one can manipulate the other is the magic of it all. As a final illustration, here’s a Scheme program implementing a key-value data store that rewrites itself to include newly-inserted values (a literal interpretation of the phrase “self-modifying code”):
(define data '()) ; Initial data.
(let ((self (car (command-line)))
(args (cdr (command-line))))
(case (string->symbol (car args))
((get)
(let ((pair (assoc (cadr args) data)))
(when pair ; Print associated value.
(display (cdr pair))
(newline))))
((set)
(let ((program ; Read own program.
(with-input-from-file self
(lambda () (read) (read)))))
(with-output-to-file self
(lambda ()
(write ; Insert new value.
`(define data
'(,(cons (cadr args) (caddr args))
,@data)))
(write program)))))))
$ scheme data.scm get a
$ scheme data.scm set a 1
$ scheme data.scm set b 2
$ scheme data.scm get a
1
$ scheme data.scm get b
2
At this point, my Scheme bias is probably showing. If you have any examples of effective (or even just amusing) uses of homoiconicity in Scheme, Lisp or any other language, please send them my way.
The term for this is homoiconicity, but the property itself isn’t terribly useful without Lisp’s iconic simplicity. If one felt pedantic, one could argue that, “well, programs are generally written as sequences of characters, so any language that can operate on strings is fundamentally homoiconic as well”. That’s technically true, but don’t stop there: strings can be encoded as lists of numbers, so any language that can operate on numerical expressions is the same. Then, you can Gödel-encode those expressions and claim that any Turing-complete language is homoiconic. This misses the point in classic fashion – the point is that Lisp’s simplicity makes the manipulations above not just possible but trivial.↩︎
There are some modern languages, such as Io and REBOL, that do very interesting things involving introspection and self-modification. Lisp, however, predates each, and has certainly served as the foundation for such new ideas.↩︎
There have actually been many attempts to remove the parentheses, most of which have floundered. Notable as fairly direct Lisp descendants without its S-expressive syntax are Logo and Dylan.↩︎
I recently wanted to limit the range of return values for a few of Ruby’s built-in Array methods. Luckily, the language provides a set of (somewhat obscure) methods to achieve just this.
Aside: Method Chaining in Ruby
Array’s “bang” methods (compact!, uniq!, etc.), which modify an array in place, all return nil when no changes are made to the array. This is occasionally annoying, as it causes the following to raise exception if any method fails to mutate its callee:
[a, r, r, a, y].compact!.uniq!.each { |e| puts e }
What we’d like is for these methods to return self unconditionally. Others have come up with solutions to this problem (like this one, using Object#tap), but if you don’t like the methods the way they are (and you don’t need to maintain the default behavior for anyone else), why not just change them?
Back to Modifying Methods
In order to do this, we could override each method one-by-one, but that’s a pain. Instead, we need a way to append a small amount of code to each of these methods without losing their original definitions in the process:
class DeviantArray < Array
Array.instance_methods(false).grep(/!/).each do |m|
orig = instance_method(m)
define_method(m) do |*args|
orig.bind(self).call(*args)
self
end
end
end
The first bit,
Array.instance_methods(false).grep(/!/).each do |m|
grabs the subset of Array’s instance methods with a “!” in the name. Inside the loop, things get a little less clear:
orig = instance_method(m)
Module#instance_method, according to ruby-doc, “Returns an UnboundMethod representing the given instance method.” More plainly, it snaps off a Module’s instance method, disassociating it from its object and letting us pass it around without having to refer to it by name. This is useful, since it allows us to hold onto the contents of a method even after redefining a new one under its old name.
The next block performs this redefinition:
define_method(m) do |*args|
orig.bind(self).call(*args)
self
end
define_method does just that. The given block is the new method content; inside, our detached method is rebound to the calling object, invoked, and self is returned.
Now, we can do the following without worrying about nil values:
DeviantArray.new([a, r, r, a, y]).flatten!.compact!.reverse!.shuffle!.uniq!.sort!
A Simpler Example
This technique isn’t only useful for changing return values – one might also use it to add a hook to a set of methods. Here’s a trivial example:
class String
[:chomp, :delete, :gsub, :squeeze, :strip, :sub].each do |m|
orig = instance_method(m)
define_method(m) do |*args|
before = size
after = orig.bind(self).call(*args)
puts "Removed #{before - after.size} characters."
after
end
end
end
Now, we get some useful feedback when we call the following:
You probably shouldn’t do this kind of thing in a large codebase with multiple authors, but if you have the freedom to tinker this can be a powerful tool.
Let’s face it, GLUT isn’t great. It’s old, clumsy, and any library that demands irreversible control of your application is suspect anyway. GLFW is a much better modern alternative.
The open-source freeglut library includes more flexible event processing functions than the standard implementation, including glutMainLoopEvent, which allows you to call GLUT’s render loop on your own terms while retaining control of main.
Half-baked monad1 tutorials are currently in vogue, so here’s mine. It’s a retracing of how I began to grok the fullness rather than a proper tutorial, but it starts simple and builds slowly so some may find it helpful.
It isn’t difficult to understand monads on a practical level given the right explanation. For me, that was the following:
Monads provide a functional way to build a specialized evaluation environment, where some implicit behavior is threaded through a series of other computations.
That’s still somewhat abstract, so let’s look at a contrived example. Imagine the series of actions necessary to mail a postcard. A program to perform these actions in an imperative style might look like this:
Now, because each step modifies the state of the given postcard, order of operations comes into play. In an eager language, this program behaves as expected, since it’s evaluated from top to bottom. However, in a lazy language like Haskell where there’s no such guarantee of evaluation order, the postcard might be mailed before it’s written, addressed, or stamped. In such a case, one must explicitly sequence the operations in order to ensure the postcard is sent correctly.
Finding Your Identity
To do this, one could rewrite the program without mutation as a series of lets, as follows:
And you know what? let* is a lot like a monad2. Specifically, it’s very similar to the simplest monad, the identity monad. To see this, first take a look at the two functions that comprise the identity monad:
(define (bind value function)
(function value))
(define (result value) value)
Any monad can be thought of as a chain of two functions, bind and result. In the case of the identity monad, these are trivial: bind applies a function to a value, and result simply returns its argument. How do these simple operations combine to do anything useful? Well, with the identity monad’s two procedures, one can bind a value to a name in the following way:
(bind 2 (lambda (x) (result (+ 1 x))))
The above is equivalent to:
((lambda (x) (+ 1 x)) 2)
Which itself is equivalent to:
(let ((x 2))
(+ x 1))
Indeed, a simple Lisp implementation might define let in precisely this way, as an inversion of the identity monad’s operations. In fact, using these basic building blocks, a monadic let* of arbitrarily many terms can be expressed by a macro like the following:
(bind 1 (lambda (a)
(bind 2 (lambda (b)
(bind (+ a b) (lambda (c)
(result (begin (+ a b c)))))))))
… Which correctly evaluates to 6. So, our monad-let* form binds names to computation results, one by one, before finally returning a single value calculated with those names. This is the evaluation environment defined by the identity monad – one of explicit ordering.
If these manipulations seem simple, that’s because, well, they are. But defining such operations as bind and result at a basic level allows for the reliable composition of more complex evaluation environments.
A State of Enlightenment
Returning to our postcard example, we now have the following:
You’ll notice that all of these functions act very much in the same way: take a postcard, return a modified postcard. Each just moves the object it’s given from one state to another. Recognizing this, one might like to specify an evaluation environment where these state changes are implicit and the syntactic burden of passing state from one function to the next lies elsewhere. The state monad, whose bind and result operations are defined below, does just this.
This monad’s operations return unary procedures. You can destructure them simply enough; essentially, bind’s result takes a state and applies bind’s original arguments in turn as functions, while result just curries away its argument, which its resulting procedure returns along with a given state. What’s important to note is that, when these operations are chained in a monadic composition (such as our monad-let* form), there is now a new value, a state, being threaded through it all.
This can be seen with another simple call to monad-let*, this time using the state monad’s bind and result operations. Instead of a value, a procedure is now returned that, when applied to an initial value, threads it through the sequence of computations implicitly3.
Hm, clearly the initial value is just being passed along and returned. In order to actually manipulate the state, we need two simple functions, get and put (along with a monadic add to make things clearer):
A redefinition of our postcard-sending routine using the state monad might look something like the following, where each function is defined to act on a state and return a monadic value:
A more real-world example of this pattern might be I/O. In fact, Haskell’s I/O monad is a special case of the state monad, where the state in question is an open file. The abstractions are similar:
Hopefully by now you see what I mean by an evaluation environment. In each of these examples, some implicit behavior accompanied the computation through its steps, creating a mini-language that helped to meet a specific requirement. And, by their nature, monads can be composed and combined, allowing for powerful pipelined expressions.
So, while monads might seem to layer a silly amount of abstraction on such trivial examples, and while it is true that monads tend to make more sense in lazy languages (when a specific evaluation order is desired), pure languages (when an imperative style is needed), strongly-typed languages (where their correctness can be strictly verified), or languages where similar abstractions aren’t provided (a Lisper, for instance, might use macros or parameter objects where state is concerned), the pattern they provide might be useful from time to time. And besides, learning new solutions (even to old problems) is always a good thing.
While not technically a lambda (and requiring GCC extensions), it does allow for some fun constructs. One can use it to express forms from other languages that naturally take inline functions, such as Lisp’s with-open-file or Ruby’s each_line:
It’s easy to forget some of the cool things C macros can do. As an example, the V7 Unix shell (what would eventually become bash) hardly looks like C in some places.