The Structure of OCaml Programs
Now we're going to take some time out to take a high-level look at some
real OCaml programs. I want to teach you about local and global
definitions, when to use ;;
vs. ;
, modules, nested functions, and
references. For this we're going to look at a lot of OCaml concepts
which won't yet make sense because we haven't seen them before. Don't
worry about the details for the moment. Concentrate instead on the
overall shape of the programs and the features which I'll point out.
Local "variables" (really local expressions)
Let's take the average
function and add a local variable in C.
(Compare it to the first definition we had above).
double average (double a, double b)
{
double sum = a + b;
return sum / 2;
}
Now let's do the same to our OCaml version:
# let average a b =
let sum = a +. b in
sum /. 2.0;;
val average : float -> float -> float = <fun>
The standard phrase let name = expression in
is used to define a named
local expression, and name
can then be used later on in the function
instead of expression
, till a ;;
which ends the block of code.
Notice that we don't indent after the in
. Just think of let ... in
as if it were a statement.
Now comparing C local variables and these named local expressions is a
sleight of hand. In fact they are somewhat different. The C variable
sum
has a slot allocated for it on the stack. You can assign to sum
later in the function if you want, or even take the address of sum
.
This is NOT true for the OCaml version. In the OCaml version, sum
is
just a shorthand name for the expression a +. b
. There is no way to
assign to sum
or change its value in any way. (We'll see how you can
do variables whose value changes in a minute).
Here's another example to make this clearer. The following two code snippets should return the same value (namely (a+b) + (a+b)²):
# let f a b =
(a +. b) +. (a +. b) ** 2.;;
val f : float -> float -> float = <fun>
# let f a b =
let x = a +. b in
x +. x ** 2.;;
val f : float -> float -> float = <fun>
The second version might be faster (but most compilers ought to be able
to perform this step of "common subexpression elimination" for you), and
it is certainly easier to read. x
in the second example is just
shorthand for a +. b
.
Global "variables" (really global expressions)
You can also define global names for things at the top level, and as with our local "variables" above, these aren't really variable at all, just shorthand names for things. Here's a real (but cut-down) example:
let html =
let content = read_whole_file file in
GHtml.html_from_string content
;;
let menu_bold () =
match bold_button#active with
| true -> html#set_font_style ~enable:[`BOLD] ()
| false -> html#set_font_style ~disable:[`BOLD] ()
;;
let main () =
(* code omitted *)
factory#add_item "Cut" ~key:_X ~callback: html#cut
;;
In this real piece of code, html
is an HTML editing widget (an object
from the lablgtk library) which is created once at the beginning of the
program by the first let html =
statement. It is then referred to in
several later functions.
Note that the html
name in the code snippet above shouldn't really be
compared to a real global variable as in C or other imperative
languages. There is no space allocated to "store" the "html
pointer".
Nor is it possible to assign anything to html
, for example to reassign
it to point to a different widget. In the next section we'll talk about
references, which are real variables.
Let-bindings
Any use of let ...
, whether at the top level (globally) or within a
function, is often called a let-binding.
References: real variables
What happens if you want a real variable that you can assign to and change throughout your program? You need to use a reference. References are very similar to pointers in C/C++. In Java, all variables which store objects are really references (pointers) to the objects. In Perl, references are references - the same thing as in OCaml.
Here's how we create a reference to an int
in OCaml:
# ref 0;;
- : int ref = {contents = 0}
Actually that statement wasn't really very useful. We created the reference and then, because we didn't name it, the garbage collector came along and collected it immediately afterwards! (actually, it was probably thrown away at compile-time.) Let's name the reference:
# let my_ref = ref 0;;
val my_ref : int ref = {contents = 0}
This reference is currently storing a zero integer. Let's put something else into it (assignment):
# my_ref := 100;;
- : unit = ()
And let's find out what the reference contains now:
# !my_ref;;
- : int = 100
So the :=
operator is used to assign to references, and the !
operator dereferences to get out the contents. Here's a rough-and-ready
comparison with C/C++:
OCaml
# let my_ref = ref 0;;
val my_ref : int ref = {contents = 0}
# my_ref := 100;;
- : unit = ()
# !my_ref;;
- : int = 100
C/C++
int a = 0; int *my_ptr = &a;
*my_ptr = 100;
*my_ptr;
References have their place, but you may find that you don't use
references very often. Much more often you'll be using
let name = expression in
to name local expressions in your function
definitions.
Nested functions
C doesn't really have a concept of nested functions. GCC supports nested functions for C programs but I don't know of any program which actually uses this extension. Anyway, here's what the gcc info page has to say about nested functions:
A "nested function" is a function defined inside another function. (Nested functions are not supported for GNU C++.) The nested function's name is local to the block where it is defined. For example, here we define a nested function named 'square', and call it twice:
foo (double a, double b)
{
double square (double z) { return z * z; }
return square (a) + square (b);
}
The nested function can access all the variables of the containing
function that are visible at the point of its definition. This is called
"lexical scoping". For example, here we show a nested function which
uses an inherited variable named offset
:
bar (int *array, int offset, int size)
{
int access (int *array, int index)
{ return array[index + offset]; }
int i;
/* ... */
for (i = 0; i < size; i++)
/* ... */ access (array, i) /* ... */
}
You get the idea. Nested functions are, however, very useful and very heavily used in OCaml. Here is an example of a nested function from some real code:
# let read_whole_channel chan =
let buf = Buffer.create 4096 in
let rec loop () =
let newline = input_line chan in
Buffer.add_string buf newline;
Buffer.add_char buf '\n';
loop ()
in
try
loop ()
with
End_of_file -> Buffer.contents buf;;
val read_whole_channel : in_channel -> string = <fun>
Don't worry about what this code does - it contains many concepts which
haven't been discussed in this tutorial yet. Concentrate instead on the
central nested function called loop
which takes just a unit argument.
You can call loop ()
from within the function read_whole_channel
,
but it's not defined outside this function. The nested function can
access variables defined in the main function (here loop
accesses the
local names buf
and chan
).
The form for nested functions is the same as for local named
expressions: let name arguments = function-definition in
.
You normally indent the function definition on a new line as in the
example above, and remember to use let rec
instead of let
if your
function is recursive (as it is in that example).
Modules and open
OCaml comes with lots of fun and interesting modules (libraries of
useful code). For example there are standard libraries for drawing
graphics, interfacing with GUI widget sets, handling large numbers, data
structures, and making POSIX system calls. These libraries are located
in /usr/lib/ocaml/
(on Unix anyway). For these examples we're
going to concentrate on one quite simple module called Graphics
.
The Graphics
module is installed into 7 files (on my system):
/usr/lib/ocaml/graphics.a
/usr/lib/ocaml/graphics.cma
/usr/lib/ocaml/graphics.cmi
/usr/lib/ocaml/graphics.cmx
/usr/lib/ocaml/graphics.cmxa
/usr/lib/ocaml/graphics.cmxs
/usr/lib/ocaml/graphics.mli
For the moment let's just concentrate on the file graphics.mli
. This
is a text file, so you can read it now. Notice first of all that the
name is graphics.mli
and not Graphics.mli
. OCaml always capitalizes
the first letter of the file name to get the module name. This can be
very confusing until you know about it!
If we want to use the functions in Graphics
there are two ways we can
do it. Either at the start of our program we have the open Graphics;;
declaration. Or we prefix all calls to the functions like this:
Graphics.open_graph
. open
is a little bit like Java's import
statement, and much more like Perl's use
statement.
To use Graphics
in the interactive toplevel, you must
first load the
library with
#load "graphics.cma";;
Windows users: For this example to work interactively on Windows, you
will need to create a custom toplevel. Issue the command ocamlmktop
-o ocaml-graphics graphics.cma
from the command line.
A couple of examples should make this clear. (The two examples draw
different things - try them out). Note the first example calls
open_graph
and the second one Graphics.open_graph
.
(* To compile this example: ocamlc graphics.cma grtest1.ml -o grtest1 *)
open Graphics;;
open_graph " 640x480";;
for i = 12 downto 1 do
let radius = i * 20 in
set_color (if i mod 2 = 0 then red else yellow);
fill_circle 320 240 radius
done;;
read_line ();;
(* To compile this example: ocamlc graphics.cma grtest2.ml -o grtest2 *)
Random.self_init ();;
Graphics.open_graph " 640x480";;
let rec iterate r x_init i =
if i = 1 then x_init
else
let x = iterate r x_init (i-1) in
r *. x *. (1.0 -. x);;
for x = 0 to 639 do
let r = 4.0 *. (float_of_int x) /. 640.0 in
for i = 0 to 39 do
let x_init = Random.float 1.0 in
let x_final = iterate r x_init 500 in
let y = int_of_float (x_final *. 480.) in
Graphics.plot x y
done
done;;
read_line ();;
Both of these examples make use of some features we haven't talked about yet: imperative-style for-loops, if-then-else and recursion. We'll talk about those later. Nevertheless you should look at these programs and try and find out (1) how they work, and (2) how type inference is helping you to eliminate bugs.
The Pervasives
module
There's one module that you never need to "open
". That is the
Pervasives
module (go and read /usr/lib/ocaml/pervasives.mli
now). All of the symbols from the Pervasives
module are automatically
imported into every OCaml program.
Renaming modules
What happens if you want to use symbols in the Graphics
module, but
you don't want to import all of them and you can't be bothered to type
Graphics
each time? Just rename it using this trick:
module Gr = Graphics;;
Gr.open_graph " 640x480";;
Gr.fill_circle 320 240 240;;
read_line ();;
Actually this is really useful when you want to import a nested module (modules can be nested inside one another), but you don't want to type out the full path to the nested module name each time.
The sequence operator ;
The semi-colon ;
is an operator, just like +
is. Well, not quite just like
+
is, but conceptually the same. The operator +
has type int -> int -> int
—
it takes two ints and returns an int (the sum). The semi-colon ;
may
be seen as having type
unit -> 'b -> 'b
— it takes two values and simply returns the second
one, the first expression is guaranteed to be evaluated before the
second. Rather like C's ,
(comma) operator. You can write
a; b; c; d
just as easily as you can write a + b + c + d
.
This is one of those "mental leaps" which is never spelled out very
well — in OCaml, nearly everything is an expression. if/then/else
is
an expression. a; b
is an expression. match foo with ...
is an
expression. The following code is perfectly legal (and all do the same
thing):
# let f x b y = if b then x+y else x+0
let f x b y = x + (if b then y else 0)
let f x b y = x + (match b with true -> y | false -> 0)
let f x b y = x + (let g z = function true -> z | false -> 0 in g y b)
let f = fun x -> fun b -> fun y -> if b then x+y else x+0
let f x b y = x + (let _ = y + 3 in (); if b then y else 0);;
val f : int -> bool -> int -> int = <fun>
Note especially the last one — I'm using ;
as an operator to "join"
two statements. All functions in OCaml can be expressed as:
let name [parameters] = expression ;;
OCaml's definition of what is an expression is just a little wider
than C's. In fact, C has the concept of "statements" — but all of C's
statements are just expressions in OCaml of type unit
(combined with the ;
operator).
The one place that ;
is different from +
is that I can refer to
+
just like a function. For instance, I can define a sum_list
function, to sum a list of ints, like:
# let sum_list = List.fold_left ( + ) 0;;
val sum_list : int list -> int = <fun>
The disappearance of ;;
Now we're going to look at a very important issue. All examples above
ended with a double semi-colon ;;
. However, if you look at OCaml code
outside of tutorials, you will find whole code bases that does not
use ;;
, not even once.
The truth is that ;;
is mostly used in the toplevel and tutorials to
mark the end of an OCaml phrase and send this phrase to the toplevel
in order to evaluate it.
Outside of the toplevel, uses of ;;
are, at best, infrequent
and are never required for well written code.
Briefly, double semi-colon ;;
can used for three reasons:
- For compatibility with the toplevel;
- To split the code to ease debugging;
- To introduce a toplevel expression.
Inserting ;;
can be sometimes useful for beginners when debugging,
since it stops any running definition. For instance, in the following
example, the definition of f
does not stop at line 1 due to the
comma ,
operator. Consequently, the compiler raises an error at the
end of the second line:
let f x = x,
let g = x * x
Inserting a double semi-colon between f
and g
detangles
the definition of f
and g
:
let f x = x,
;;
let g = x * x
Another use of ;;
is to introduce a new toplevel expression after
some definitions:
let b = "This started with"
let s = "a very nonsensical message.";;
print_endline b; print_endline s
open String
let s = concat "" ["Fortunately"; ", "; "the"; "end"; "is"; "near"; "."];;
print_endline s;;
let s = "let end here" in print_endline s
In particular, in the above examples, all lines starting after ;;
are
purely effectful and deleting them will only affect the effect of the code,
not the following definitions.
However, this use of ;;
can (should) always be replaced by either
let () = expression ()
if the result of the expression is unit
, or
let _ = expression ()
otherwise. Note that the first form is safer, since it requires that
the type of the returned expression is unit
; preventing us, for instance,
from forgetting an argument in
let () =
print_newline
(* here, we forgot () and the compiler will complain. *)
With this convention, there are no toplevel expressions anymore: any
module can be written as a sequence of definitions. Consequently, some
style guidelines consider that ;;
should never be used outside of the
toplevel (see for instance these style guidelines).
Putting it all together: some real code
In this section we're going to show some real code fragments from the
lablgtk 1.2 library. (Lablgtk is the OCaml interface to the native Unix
Gtk widget library). A word of warning: these fragments contain a lot of
ideas which we haven't discussed yet. Don't look at the details, look
instead at the overall shape of the code, where the authors used ;;
,
where they used ;
and where they used open
, how they indented the
code, how they used local and global named expressions.
... However, I'll give you some clues so you don't get totally lost!
?foo
and~foo
is OCaml's way of doing optional and named arguments to functions. There is no real parallel to this in C-derived languages, but Perl, Python and Smalltalk all have this concept that you can name the arguments in a function call, omit some of them, and supply the others in any order you like.foo#bar
is a method invocation (calling a method calledbar
on an object calledfoo
). It's similar tofoo->bar
orfoo.bar
or$foo->bar
in C++, Java or Perl respectively.
First snippet: The programmer opens a couple of standard libraries
(eliding the ;;
because the next keyword is open
and let
respectively). They then create a function called file_dialog
. Inside
this function they define a named expression called sel
using a
two-line let sel = ... in
statement. Then they call several methods on
sel
.
(* First snippet *)
open StdLabels
open GMain
let file_dialog ~title ~callback ?filename () =
let sel =
GWindow.file_selection ~title ~modal:true ?filename () in
sel#cancel_button#connect#clicked ~callback:sel#destroy;
sel#ok_button#connect#clicked ~callback:do_ok;
sel#show ()
Second snippet: Just a long list of global names at the top level.
Notice that the author elided every single one of the ;;
because of
Rule #2.
(* Second snippet *)
let window = GWindow.window ~width:500 ~height:300 ~title:"editor" ()
let vbox = GPack.vbox ~packing:window#add ()
let menubar = GMenu.menu_bar ~packing:vbox#pack ()
let factory = new GMenu.factory menubar
let accel_group = factory#accel_group
let file_menu = factory#add_submenu "File"
let edit_menu = factory#add_submenu "Edit"
let hbox = GPack.hbox ~packing:vbox#add ()
let editor = new editor ~packing:hbox#add ()
let scrollbar = GRange.scrollbar `VERTICAL ~packing:hbox#pack ()
Third snippet: The author imports all the symbols from the GdkKeysyms
module. Now we have an unusual let-binding. let _ = expression
means
"calculate the value of the expression (with all the side-effects that
may entail), but throw away the result". In this case, "calculate the
value of the expression" means to run Main.main ()
which is Gtk's main
loop, which has the side-effect of popping the window onto the screen
and running the whole application. The "result" of Main.main ()
is
insignificant - probably a unit
return value, but I haven't checked -
and it doesn't get returned until the application finally exits.
Notice in this snippet how we have a long series of essentially procedural commands. This is really a classic imperative program.
(* Third snippet *)
open GdkKeysyms
let () =
window#connect#destroy ~callback:Main.quit;
let factory = new GMenu.factory file_menu ~accel_group in
factory#add_item "Open..." ~key:_O ~callback:editor#open_file;
factory#add_item "Save" ~key:_S ~callback:editor#save_file;
factory#add_item "Save as..." ~callback:editor#save_dialog;
factory#add_separator ();
factory#add_item "Quit" ~key:_Q ~callback:window#destroy;
let factory = new GMenu.factory edit_menu ~accel_group in
factory#add_item "Copy" ~key:_C ~callback:editor#text#copy_clipboard;
factory#add_item "Cut" ~key:_X ~callback:editor#text#cut_clipboard;
factory#add_item "Paste" ~key:_V ~callback:editor#text#paste_clipboard;
factory#add_separator ();
factory#add_check_item "Word wrap" ~active:false
~callback:editor#text#set_word_wrap;
factory#add_check_item "Read only" ~active:false
~callback:(fun b -> editor#text#set_editable (not b));
window#add_accel_group accel_group;
editor#text#event#connect#button_press
~callback:(fun ev ->
let button = GdkEvent.Button.button ev in
if button = 3 then begin
file_menu#popup ~button ~time:(GdkEvent.Button.time ev); true
end else false);
editor#text#set_vadjustment scrollbar#adjustment;
window#show ();
Main.main ()