Labels
Exceptions and hash tables
(unfinished)
Mutually recursive functions
Suppose I want to define two functions which call each other. This is actually not a very common thing to do, but it can be useful sometimes. Here's a contrived example (thanks to Ryan Tarpine): The number 0 is even. Other numbers greater than 0 are even if their predecessor is odd. Hence:
# let rec even n =
match n with
| 0 -> true
| x -> odd (x-1);;
Error: Unbound value odd
The code above doesn't compile because we haven't defined the function
odd
yet! That's easy though. Zero is not odd, and other numbers
greater than 0 are odd if their predecessor is even. So to make this
complete we need that function too:
# let rec even n =
match n with
| 0 -> true
| x -> odd (x-1)
let rec odd n =
match n with
| 0 -> false
| x -> even (x-1);;
Error: Unbound value odd
The only problem is... this program doesn't compile. In order to compile
the even
function, we already need the definition of odd
, and to
compile odd
we need the definition of even
. So swapping the two
definitions around won't help either.
There are no "forward prototypes" (as seen in languages descended
from C) in OCaml but there is a special syntax
for defining a set of two or more mutually recursive functions, like
odd
and even
:
# let rec even n =
match n with
| 0 -> true
| x -> odd (x-1)
and odd n =
match n with
| 0 -> false
| x -> even (x-1);;
val even : int -> bool = <fun>
val odd : int -> bool = <fun>
You can also use similar syntax for writing mutually recursive class definitions and modules.
Aliases for function names and arguments
Recall that we talked about partial function application. It's possible to use this as a neat trick to save typing: aliasing function names, and function arguments.
Although we haven't looked at object-oriented programming (that's the
subject for the "Objects" section),
here's an example from OCamlNet of an
aliased function call. All you need to know is that
cgi # output # output_string "string"
is a method call, similar to
cgi.output().output_string ("string")
in Java.
let begin_page cgi title =
let out = cgi # output # output_string in
out "<html>\n";
out "<head>\n";
out ("<title>" ^ text title ^ "</title>\n");
out ("<style type=\"text/css\">\n");
out "body { background: white; color: black; }\n";
out "</style>\n";
out "</head>\n";
out "<body>\n";
out ("<h1>" ^ text title ^ "</h1>\n")
The let out = ...
is a partial function application for that method
call (partial, because the string parameter hasn't been applied). out
is therefore a function, which takes a string parameter.
out "<html>\n";
is equivalent to:
cgi # output # output_string "<html>\n";
We saved ourselves a lot of typing there.
We can also add arguments. This alternative definition of print_string
can be thought of as a kind of alias for a function name plus arguments:
let print_string = output_string stdout
output_string
takes two arguments (a channel and a string), but since
we have only supplied one, it is partially applied. So print_string
is
a function, expecting one string argument.
Labelled and optional arguments to functions
Labelled arguments
Python has a nice syntax for writing arguments to functions. Here's an example (from the Python tutorial, since I'm not a Python programmer):
def ask_ok(prompt, retries=4, complaint='Yes or no, please!'):
# function definition omitted
Here are the ways we can call this Python function:
ask_ok ('Do you really want to quit?')
ask_ok ('Overwrite the file?', 2)
ask_ok (prompt='Are you sure?')
ask_ok (complaint='Please answer yes or no!', prompt='Are you sure?')
Notice that in Python we are allowed to name arguments when we call them, or use the usual function call syntax, and we can have optional arguments with default values.
You can do something similar in Perl:
sub ask_ok
{
my %params = @_;
my $prompt = $params{prompt};
my $retries = exists $params{retries} ? $params{retries} : 4;
# ... etc.
}
ask_ok (prompt => "Are you sure?", retries => 2);
OCaml also has a way to label arguments and have optional arguments with default values.
The basic syntax is:
# let rec range ~first:a ~last:b =
if a > b then []
else a :: range ~first:(a+1) ~last:b;;
val range : first:int -> last:int -> int list = <fun>
(Notice that both to
and end
are reserved words in OCaml, so they
cannot be used as labels. So you cannot have ~from/~to
or
~start/~end
.)
The type of our previous range
function was:
range : int -> int -> int list
And the type of our new range
function with labelled arguments is:
range : first:int -> last:int -> int list
Confusingly, the ~
(tilde) is not shown in the type definition, but
you need to use it everywhere else.
With labelled arguments, it doesn't matter which order you give the arguments anymore:
# range ~first:1 ~last:10;;
- : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10]
# range ~last:10 ~first:1;;
- : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10]
There is also a shorthand way to name the arguments, so that the label
is the same as the variable in the function definition. Here is a
function defined in lablgtk/gaux.ml
(a library of useful oddities used
in lablgtk):
# let may ~f x =
match x with
| None -> ()
| Some x -> ignore(f x);;
val may : f:('a -> 'b) -> 'a option -> unit = <fun>
It's worth spending some time working out exactly what this function
does, and also working out by hand its type signature. There's a lot
going on. First of all, the parameter ~f
is just shorthand for ~f:f
(ie. the label is ~f
and the variable used in the function is f
).
Secondly notice that the function takes two parameters. The second
parameter (x
) is unlabelled - it is permitted for a function to take a
mixture of labelled and unlabelled arguments if you want.
What is the type of the labelled f
parameter? Obviously it's a
function of some sort.
What is the type of the unlabelled x
parameter? The match
clause
gives us a clue. It's an 'a option
.
This tells us that f
takes an 'a
parameter, and the return value of
f
is ignored, so it could be anything. The type of f
is therefore
'a -> 'b
.
The may
function as a whole returns unit
. Notice in each case of the
match
the result is ()
.
Thus the type of the may
function is (and you can verify this in the
OCaml interactive toplevel if you want):
may : f:('a -> 'b) -> 'a option -> unit
What does this function do? Running the function in the OCaml toplevel gives us some clues:
# may ~f:print_endline None;;
- : unit = ()
# may ~f:print_endline (Some "hello");;
hello
- : unit = ()
If the unlabelled argument is a “null pointer” then may
does nothing.
Otherwise may
calls the f
function on the argument. Why is this
useful? We're just about to find out ...
Optional arguments
Optional arguments are like labelled arguments, but we use ?
instead
of ~
in front of them. Here is an example:
# let rec range ?(step=1) a b =
if a > b then []
else a :: range ~step (a+step) b;;
val range : ?step:int -> int -> int -> int list = <fun>
Note the somewhat confusing syntax, switching between ?
and ~
. We'll
talk about that in the next section. Here is how you call this function:
# range 1 10;;
- : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10]
# range 1 10 ~step:2;;
- : int list = [1; 3; 5; 7; 9]
In this case, ?(step=1)
fairly obviously means that ~step
is an
optional argument which defaults to 1. We can also omit the default
value and just have an optional argument. This example is modified from
lablgtk:
# type window = { mutable title: string;
mutable width: int;
mutable height: int }
let create_window () =
{ title = "none"; width = 640; height = 480; }
let set_title window title =
window.title <- title
let set_width window width =
window.width <- width
let set_height window height =
window.height <- height
let open_window ?title ?width ?height () =
let window = create_window () in
may ~f:(set_title window) title;
may ~f:(set_width window) width;
may ~f:(set_height window) height;
window;;
type window = {
mutable title : string;
mutable width : int;
mutable height : int;
}
val create_window : unit -> window = <fun>
val set_title : window -> string -> unit = <fun>
val set_width : window -> int -> unit = <fun>
val set_height : window -> int -> unit = <fun>
val open_window :
?title:string -> ?width:int -> ?height:int -> unit -> window = <fun>
This example is significantly complex and quite subtle, but the pattern
used is very common in the lablgtk source code. Let's concentrate on the
simple create_window
function first. This function takes a unit
and
returns a window
, initialized with default settings for title, width
and height:
# create_window ();;
- : window = {title = "none"; width = 640; height = 480}
The set_title
, set_width
and set_height
functions are impure
functions which modify the window
structure, in the obvious way. For
example:
# let w = create_window () in
set_title w "My Application";
w;;
- : window = {title = "My Application"; width = 640; height = 480}
So far this is just the imperative "mutable records" which we talked
about in the previous chapter. Now the complex part is the open_window
function. This function takes 4 arguments, three of them optional,
followed by a required, unlabelled unit
. Let's first see this function
in action:
# open_window ~title:"My Application" ();;
- : window = {title = "My Application"; width = 640; height = 480}
# open_window ~title:"Clock" ~width:128 ~height:128 ();;
- : window = {title = "Clock"; width = 128; height = 128}
It does what you expect, but how?! The secret is in the may
function
(see above) and the fact that the optional parameters don't have
defaults.
When an optional parameter doesn't have a default, then it has type
'a option
. The 'a
would normally be inferred by type inference, so
in the case of ?title
above, this has type string option
.
Remember the may
function? It takes a function and an argument, and
calls the function on the argument provided the argument isn't None
.
So:
may ~f:(set_title window) title;
If the optional title argument is not specified by the caller, then
title
= None
, so may
does nothing. But if we call the function
with, for example,
open_window ~title:"My Application" ()
then title
= Some "My Application"
, and may
therefore calls
set_title window "My Application"
.
You should make sure you fully understand this example before proceeding to the next section.
"Warning: This optional argument cannot be erased"
We've just touched upon labels and optional arguments, but even this
brief explanation should have raised several questions. The first may be
why the extra unit
argument to open_window
? Let's try defining this
function without the extra unit
:
# let open_window ?title ?width ?height =
let window = create_window () in
may ~f:(set_title window) title;
may ~f:(set_width window) width;
may ~f:(set_height window) height;
window;;
Warning 16: this optional argument cannot be erased.
val open_window : ?title:string -> ?width:int -> ?height:int -> window =
<fun>
Although OCaml has compiled the function, it has generated a somewhat
infamous warning: "This optional argument cannot be erased", referring
to the final ?height
argument. To try to show what's going on here,
let's call our modified open_window
function:
# open_window;;
- : ?title:string -> ?width:int -> ?height:int -> window = <fun>
# open_window ~title:"My Application";;
- : ?width:int -> ?height:int -> window = <fun>
Did that work or not? No it didn't. In fact it didn't even run the
open_window
function at all. Instead it printed some strange type
information. What's going on?
Recall currying and uncurrying, and partial application of functions. If
we have a function plus
defined as:
# let plus x y =
x + y;;
val plus : int -> int -> int = <fun>
We can partially apply this, for example as plus 2
which is "the
function that adds 2 to things":
# let f = plus 2;;
val f : int -> int = <fun>
# f 5;;
- : int = 7
# f 100;;
- : int = 102
In the plus
example, the OCaml compiler can easily work out that
plus 2
doesn't have enough arguments supplied yet. It needs another
argument before the plus
function itself can be executed. Therefore
plus 2
is a function which is waiting for its extra argument to come
along.
Things are not so clear when we add optional arguments into the mix. The
call to open_window;;
above is a case in point. Does the user mean
"execute open_window
now"? Or does the user mean to supply some or all
of the optional arguments later? Is open_window;;
waiting for extra
arguments to come along like plus 2
?
OCaml plays it safe and doesn't execute open_window
. Instead it treats
it as a partial function application. The expression open_window
literally evaluates to a function value.
Let's go back to the original and working definition of open_window
where we had the extra unlabelled unit
argument at the end:
# let open_window ?title ?width ?height () =
let window = create_window () in
may ~f:(set_title window) title;
may ~f:(set_width window) width;
may ~f:(set_height window) height;
window;;
val open_window :
?title:string -> ?width:int -> ?height:int -> unit -> window = <fun>
If you want to pass optional arguments to open_window
you must do so
before the final unit
, so if you type:
# open_window ();;
- : window = {title = "none"; width = 640; height = 480}
you must mean "execute open_window
now with all optional arguments
unspecified". Whereas if you type:
# open_window;;
- : ?title:string -> ?width:int -> ?height:int -> unit -> window = <fun>
you mean "give me the functional value" or (more usually in the
toplevel) "print out the type of open_window
".
More ~
shorthand
Let's rewrite the range
function yet again, this time using as much
shorthand as possible for the labels:
# let rec range ~first ~last =
if first > last then []
else first :: range ~first:(first+1) ~last;;
val range : first:int -> last:int -> int list = <fun>
Recall that ~foo
on its own is short for ~foo:foo
. This applies also
when calling functions as well as declaring the arguments to functions,
hence in the above the highlighted red ~last
is short for
~last:last
.
Using ?foo
in a function call
There's another little wrinkle concerning optional arguments. Suppose we
write a function around open_window
to open up an application:
# let open_application ?width ?height () =
open_window ~title:"My Application" ~width ~height;;
Error: This expression has type 'a option
but an expression was expected of type int
Recall that ~width
is shorthand for ~width:width
. The type of
width
is 'a option
, but open_window ~width:
expects an int
.
OCaml provides more syntactic sugar. Writing ?width
in the function
call is shorthand for writing ~width:(unwrap width)
where unwrap
would be a function which would remove the "option
wrapper" around
width
(it's not actually possible to write an unwrap
function like
this, but conceptually that's the idea). So the correct way to write
this function is:
# let open_application ?width ?height () =
open_window ~title:"My Application" ?width ?height;;
val open_application : ?width:int -> ?height:int -> unit -> unit -> window =
<fun>
When and when not to use ~
and ?
The syntax for labels and optional arguments is confusing, and you may
often wonder when to use ~foo
, when to use ?foo
and when to use
plain foo
. It's something of a black art which takes practice to get
right.
?foo
is only used when declaring the arguments of a function, ie:
let f ?arg1 ... =
or when using the specialised "unwrap option
wrapper" form for
function calls:
# let open_application ?width ?height () =
open_window ~title:"My Application" ?width ?height;;
val open_application : ?width:int -> ?height:int -> unit -> unit -> window =
<fun>
The declaration ?foo
creates a variable called foo
, so if you need
the value of ?foo
, use just foo
.
The same applies to labels. Only use the ~foo
form when declaring
arguments of a function, ie:
let f ~foo:foo ... =
The declaration ~foo:foo
creates a variable called simply foo
, so if
you need the value just use plain foo
.
Things, however, get complicated for two reasons: first, the shorthand
form ~foo
(equivalent to ~foo:foo
), and second, when you call a
function which takes a labelled or optional argument and you use the
shorthand form.
Here is some apparently obscure code from lablgtk to demonstrate all of this:
let html ?border_width ?width ?height ?packing ?show () = (* line 1 *)
let w = create () in
load_empty w;
Container.set w ?border_width ?width ?height; (* line 4 *)
pack_return (new html w) ~packing ~show (* line 5 *)
On line 1 we have the function definition. Notice there are 5 optional
arguments, and the mandatory unit
6th argument. Each of the
optional arguments is going to define a variable, eg. border_width
, of
type 'a option
.
On line 4 we use the special ?foo
form for passing optional arguments
to functions which take optional arguments. Container.set
has the
following type:
module Container = struct
let set ?border_width ?(width = -2) ?(height = -2) w =
(* ... *)
Line 5 uses the ~
shorthand. Writing this in long form:
pack_return (new html w) ~packing:packing ~show:show
The pack_return
function actually takes mandatory labelled arguments
called ~packing
and ~show
, each of type 'a option
. In other words,
pack_return
explicitly unwraps the option
wrapper.
Addendum
If you think labels and optional arguments are complicated, that's because they are! Luckily, however, this is a relatively new feature in OCaml, and it's not yet widely used. In fact if you're not hacking on lablgtk, it's unlikely you'll see labels and optional arguments used at all (at the moment).
More variants (“polymorphic variants”)
Try compiling the following C code:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
enum lock { open, close };
main ()
{
int fd, n;
char buffer[256];
fd = open ("/etc/motd", O_RDONLY); // line 12
while ((n = read (fd, buffer, sizeof buffer)) > 0)
write (1, buffer, n);
close (fd); // line 15
}
When I compile the code I get a whole bunch of errors including:
test.c: In function `main':
test.c:12: error: called object is not a function
test.c:15: error: called object is not a function
This illustrates one problem with enumerated types (enums) in C. In the
example above, one enum statement reserves three symbols, namely
lock
, open
and close
. Here's another example:
enum lock { open, close };
enum door { open, close };
Compiling gives:
test.c:2: error: conflicting types for `open'
test.c:1: error: previous declaration of `open'
test.c:2: error: conflicting types for `close'
test.c:1: error: previous declaration of `close'
The first enum defines the symbol open
as something of type
enum lock
. You cannot reuse that symbol in another enum.
This will be familiar to most C/C++ programmers, and they won't write naive code like that above. However the same issue happens with OCaml variants, but OCaml provides a way to work around it.
Here is some OCaml code, which actually does compile:
# type lock = Open | Close
type door = Open | Close;;
type lock = Open | Close
type door = Open | Close
After running those two statements, what is the type of Open
? We can
find out easily enough in the toplevel:
# type lock = Open | Close;;
type lock = Open | Close
# type door = Open | Close;;
type door = Open | Close
# Open;;
- : door = Open
OCaml uses the most recent definition for Open
, giving it the type
door
. This is actually not such a serious problem because if you
accidentally tried to use Open
in the type context of a lock
, then
OCaml's wonderful type inference would immediately spot the error and
you wouldn't be able to compile the code.
So far, so much like C. Now I said that OCaml provides a way to work
around the constraint that Open
can only have one type. In other
words, suppose I want to use Open
to mean either "the Open
of type
lock
" or "the Open
of type door
" and I want OCaml to work out
which one I mean.
The syntax is slightly different, but here is how we do it:
# type lock = [ `Open | `Close ];;
type lock = [ `Close | `Open ]
# type door = [ `Open | `Close ];;
type door = [ `Close | `Open ]
Notice the syntactic differences:
- Each variant name is prefixed with
`
(a back tick). - You have to put square brackets (
[]
) around the alternatives.
The question naturally arises: What is the type of `Open
?
# `Open;;
- : [> `Open ] = `Open
[> `Open]
can be read as
[ `Open | and some other possibilities which we don't know about ]
.
The “>” (greater than) sign indicates that the set of possibilities is
bigger than those listed (open-ended).
There's nothing special about `Open
. Any back-ticked word can be
used as a type, even one which we haven't mentioned before:
# `Foo;;
- : [> `Foo ] = `Foo
# `Foo 42;;
- : [> `Foo of int ] = `Foo 42
Let's write a function to print the state of a lock
:
# let print_lock st =
match st with
| `Open -> print_endline "The lock is open"
| `Close -> print_endline "The lock is closed";;
val print_lock : [< `Close | `Open ] -> unit = <fun>
Take a careful look at the type of that function. Type inference has
worked out that the st
argument has type [< `Close | `Open]
. The
“<” (less than) sign means that this is a closed class. In
other words, this function will only work on `Close
or `Open
and not on anything else.
# print_lock `Open;;
The lock is open
- : unit = ()
Notice that print_lock
works just as well with a door
as with a
lock
! We've deliberately given up some type safety, and type inference
is now being used to help guess what we mean, rather than enforce
correct coding.
This is only an introduction to polymorphic variants. Because of the reduction in type safety, it is recommended that you don't use these in your code. You will, however, see them in advanced OCaml code quite a lot precisely because advanced programmers will sometimes want to weaken the type system to write advanced idioms.