The Basics
Running OCaml code
The easiest way to get started is to run an interactive session in your browser thanks to TryOCaml.
To install OCaml on your computer, see the Install documentation.
To quickly try small OCaml expressions, you can use an interactive
toplevel, or REPL (Read–Eval–Print Loop). The ocaml
command provides
a very basic toplevel (you should install rlwrap
through your system
package manager and run rlwrap ocaml
to get history navigation). If
you can install it through OPAM or your
system package manager, we recommend the use of
the utop toplevel instead, which has
the same basic interface but is much more convenient to use (history
navigation, auto-completion, etc.).
Use ;;
to indicate that you've finished entering each statement. Here is what is looks like running ocaml
:
$ ocaml
OCaml version 4.10.0
# 1+1;;
- : int = 2
This is how running the same code looks like when using utop
:
───────┬────────────────────────────────────────────────────────────┬─────
│ Welcome to utop version 1.18 (using OCaml version 4.02.3)! │
└────────────────────────────────────────────────────────────┘
Type #utop_help for help about using utop.
─( 10:12:16 )─< command 0 >───────────────────────────────────────────────
utop # 1 + 1;;
- : int = 2
To compile an OCaml program named my_prog.ml
to a native executable, use ocamlbuild my_prog.native
:
$ mkdir my_project
$ cd my_project
$ echo 'let () = print_endline "Hello, World!"' > my_prog.ml
$ ocamlbuild my_prog.native
Finished, 4 targets (0 cached) in 00:00:00.
$ ./my_prog.native
Hello, World!
See Compiling OCaml projects for more information.
Comments
OCaml comments are delimited by (*
and *)
, like this:
(* This is a single-line comment. *)
(* This is a
* multi-line
* comment.
*)
In other words, the commenting convention is very similar to original C
(/* ... */
). There is currently no single-line comment syntax (like
# ...
in Perl or // ...
in C99/C++/Java).
OCaml counts nested (* ... *)
blocks, and this allows you to comment
out regions of code very easily:
(* This code is broken ...
(* Primality test. *)
let is_prime n =
(* note to self: ask about this on the mailing lists *) XXX;;
*)
Calling functions
Let's say you've written a function — we'll call it repeated
— which
takes a string s
and a number n
, and returns a new string which
contains original s
repeated n
times.
In most C-derived languages a call to this function will look like this:
repeated ("hello", 3) /* this is C code */
This means "call the function repeated
with two arguments, first
argument the string hello and second argument the number 3".
OCaml, in common with other functional languages, writes and brackets function calls differently, and this is the cause of many mistakes. Here is the same function call in OCaml:
repeated "hello" 3 (* this is OCaml code *)
Note — no brackets, and no comma between the arguments.
The syntax repeated ("hello", 3)
is meaningful in OCaml. It means
"call the function repeated
with ONE argument, that argument being a
'pair' structure of two elements". Of course that would be a mistake,
because the repeated
function is expecting two arguments, not one, and
the first argument should be a string, not a pair. But let's not worry
about pairs ("tuples") just yet. Instead, just remember that it's a
mistake to put the brackets and commas in around function call
arguments.
Let's have another function — prompt_string
— which takes a string to
prompt and returns the string entered by the user. We want to pass this
string into repeated
. Here are the C and OCaml versions:
/* C code: */
repeated (prompt_string ("Name please: "), 3)
(* OCaml code: *)
repeated (prompt_string "Name please: ") 3
Take a careful look at the bracketing and the missing comma. In the OCaml version, the brackets enclose the first argument of repeated because that argument is the result of another function call. In general the rule is: "bracket around the whole function call — don't put brackets around the arguments to a function call". Here are some more examples:
f 5 (g "hello") 3 (* f has three arguments, g has one argument *)
f (g 3 4) (* f has one argument, g has two arguments *)
# repeated ("hello", 3) (* OCaml will spot the mistake *);;
Error: This expression has type 'a * 'b
but an expression was expected of type string
Defining a function
We all know how to define a function (or static method, for Java-heads) in our existing languages. How do we do it in OCaml?
The OCaml syntax is pleasantly concise. Here's a function which takes two floating point numbers and calculates the average:
let average a b =
(a +. b) /. 2.0;;
Type this into the OCaml interactive toplevel (on Unix, type the command ocaml
from the shell) and you'll see this:
# let average a b =
(a +. b) /. 2.0;;
val average : float -> float -> float = <fun>
If you look at the function definition closely, and also at what OCaml prints back at you, you'll have a number of questions:
- What're all those extra periods doing there in the code?
- What does all that stuff about
float -> float -> float
mean?
I'll answer those questions in the next sections, but first I want to go
and define the same function in C (the Java definition would be fairly
similar to C), and hopefully that should raise even more questions.
Here's our C version of average
:
double average (double a, double b)
{
return (a + b) / 2;
}
Now look at our much shorter OCaml definition above. Hopefully you'll be asking:
- Why don't we have to define the types of
a
andb
in the OCaml version? How does OCaml know what the types are (indeed, does OCaml know what the types are, or is OCaml completely dynamically typed?). - In C, the
2
is implicitly converted into adouble
, can't OCaml do the same thing? - What is the OCaml way to write
return
?
OK, let's get some answers.
- OCaml is a strongly statically typed language (in other words, there's nothing dynamic going on between int, float and string, as in Perl).
- OCaml uses type inference to work out the types, so you don't have to. If you use the OCaml interactive toplevel as above, then OCaml will tell you its inferred type for your function.
- OCaml doesn't do any implicit casting. If you want a float, you have
to write
2.0
because2
is an integer. OCaml does no automatic conversion between int, float, string or any other type. - As a side-effect of type inference in OCaml, functions (including
operators) can't have overloaded definitions. OCaml defines
+
as the integer addition function. To add floats, use+.
(note the trailing period). Similarly, use-.
,*.
,/.
for other float operations. - OCaml doesn't have a
return
keyword — the last expression in a function becomes the result of the function automatically.
We will present more details in the following sections and chapters.
Basic types
The basic types in OCaml are:
OCaml type Range
int 31-bit signed int (roughly +/- 1 billion) on 32-bit
processors, or 63-bit signed int on 64-bit processors
float IEEE double-precision floating point, equivalent to C's double
bool A boolean, written either true or false
char An 8-bit character
string A string
unit Written as ()
OCaml uses one of the bits in an int
internally in order to be able to
automatically manage the memory use (garbage collection). This is why
the basic int
is 31 bits, not 32 bits (63 bits if you're using a 64
bit platform). In practice this isn't an issue except in a few
specialised cases. For example if you're counting things in a loop, then
OCaml limits you to counting up to 1 billion instead of 2 billion. This
isn't going to be a problem because if you're counting things close to
this limit in any language, then you ought to be using bignums (the
Nat
and Big_int
modules in OCaml). However if you need to do things
such as processing 32 bit types (eg. you're writing crypto code or a
network stack), OCaml provides a nativeint
type which matches the
native integer type for your platform.
OCaml doesn't have a basic unsigned integer type, but you can get the
same effect using nativeint
. OCaml doesn't have builtin single-precision
floating point numbers.
OCaml provides a char
type which is used for characters, written 'x'
for example. Unfortunately the char
type does not support Unicode or
UTF-8. This is a serious flaw in OCaml which should be fixed, but for
the time being there are comprehensive Unicode
libraries
which work around it.
Strings are not just lists of characters. They have their own, more efficient internal representation.
The unit
type is sort of like void
in C, but we'll talk about it
more below.
Implicit vs. explicit casts
In C-derived languages ints get promoted to floats in certain
circumstances. For example if you write 1 + 2.5
then the first
argument (which is an integer) is promoted to a floating point number,
and the result is also a floating point number. It's as if you had
written ((double) 1) + 2.5
, but all done implicitly.
OCaml never does implicit casts like this. In OCaml, 1 + 2.5
is a type
error. The +
operator in OCaml requires two ints as arguments, and
here we're giving it an int and a float, so it reports this error:
# 1 + 2.5;;
Error: This expression has type float but an expression was expected of type
int
(In the "translated from the French" language of OCaml error messages this means "you put a float here, but I was expecting an int").
To add two floats together you need to use a different operator, +.
(note the trailing period).
OCaml doesn't promote ints to floats automatically so this is also an error:
# 1 +. 2.5;;
Error: This expression has type int but an expression was expected of type
float
Here OCaml is now complaining about the first argument.
What if you actually want to add an integer and a floating point number
together? (Say they are stored as i
and f
). In OCaml you need to
explicitly cast:
(float_of_int i) +. f
float_of_int
is a function which takes an int
and returns a float
.
There are a whole load of these functions, called such things as
int_of_float
, char_of_int
, int_of_char
, string_of_int
and so on,
and they mostly do what you expect.
Since converting an int
to a float
is a particularly common
operation, the float_of_int
function has a shorter alias: the above
example could simply have been written
float i +. f
(Note that unlike C, it is perfectly valid in OCaml for a type and a function to have the same name.)
Is implicit or explicit casting better?
You might think that these explicit casts are ugly, time-consuming even, and you have a point, but there are at least two arguments in their favour. Firstly, OCaml needs this explicit casting to be able to do type inference (see below), and type inference is such a wonderful time-saving feature that it easily offsets the extra keyboarding of explicit casts. Secondly, if you've spent time debugging C programs you'll know that (a) implicit casts cause errors which are hard to find, and (b) much of the time you're sitting there trying to work out where the implicit casts happen. Making the casts explicit helps you in debugging. Thirdly, some casts (particularly int <-> float) are actually very expensive operations. You do yourself no favours by hiding them.
Ordinary functions and recursive functions
Unlike in C-derived languages, a function isn't recursive unless you
explicitly say so by using let rec
instead of just let
. Here's an
example of a recursive function:
# let rec range a b =
if a > b then []
else a :: range (a+1) b;;
val range : int -> int -> int list = <fun>
Notice that range
calls itself.
The only difference between let
and let rec
is in the scoping of the
function name. If the above function had been defined with just let
,
then the call to range
would have tried to look for an existing
(previously defined) function called range
, not the
currently-being-defined function. Using let
(without rec
) allows you
to re-define a value in terms of the previous definition. For example:
# let positive_sum a b =
let a = max a 0
and b = max b 0 in
a + b;;
val positive_sum : int -> int -> int = <fun>
This redefinition hides the previous "bindings" of a
and b
from the
function definition. In some situations coders prefer this pattern to
using a new variable name (let a_pos = max a 0
) as it makes the old
binding inaccessible, so that only the latest values of a and b are
accessible.
There is no performance difference between functions defined using let
and functions defined using let rec
, so if you prefer you could always
use the let rec
form and get the same semantics as C-like languages.
Types of functions
Because of type inference you will rarely if ever need to explicitly
write down the type of your functions. However, OCaml often prints out
what it thinks are the types of your functions, so you need to know the
syntax for this. For a function f
which takes arguments arg1
,
arg2
, ... argn
, and returns type rettype
, the compiler will print:
f : arg1 -> arg2 -> ... -> argn -> rettype
The arrow syntax looks strange now, but when we come to so-called "currying" later you'll see why it was chosen. For now I'll just give you some examples.
Our function repeated
which takes a string and an integer and returns
a string has type:
repeated : string -> int -> string
Our function average
which takes two floats and returns a float has
type:
average : float -> float -> float
The OCaml standard int_of_char
casting function:
int_of_char : char -> int
If a function returns nothing (void
for C and Java programmers), then
we write that it returns the unit
type. Here, for instance, is the
OCaml equivalent of fputc
:
output_char : out_channel -> char -> unit
Polymorphic functions
Now for something a bit stranger. What about a function which takes anything as an argument? Here's an odd function which takes an argument, but just ignores it and always returns 3:
let give_me_a_three x = 3
What is the type of this function? In OCaml we use a special placeholder to mean "any type you fancy". It's a single quote character followed by a letter. The type of the above function would normally be written:
give_me_a_three : 'a -> int
where 'a
really does mean any type. You can, for example, call this
function as give_me_a_three "foo"
or give_me_a_three 2.0
and both
are quite valid expressions in OCaml.
It won't be clear yet why polymorphic functions are useful, but they are very useful and very common, and so we'll discuss them later on. (Hint: polymorphism is kind of like templates in C++ or generics in Java 1.5).
Type inference
So the theme of this tutorial is that functional languages have many really cool features, and OCaml is a language which has all of these really cool features stuffed into it at once, thus making it a very practical language for real programmers to use. But the odd thing is that most of these cool features have nothing to do with "functional programming" at all. In fact, I've come to the first really cool feature, and I still haven't talked about why functional programming is called "functional". Anyway, here's the first really cool feature: type inference.
Simply put: you don't need to declare the types of your functions and variables, because OCaml will just figure them out for you!
In addition OCaml goes on to check all your types match up (even across different files).
But OCaml is also a practical language, and for this reason it contains backdoors into the type system allowing you to bypass this checking on the rare occasions that it is sensible to do this. Only gurus will probably need to bypass the type checking.
Let's go back to the average
function which we typed into the OCaml
interactive toplevel:
# let average a b =
(a +. b) /. 2.0;;
val average : float -> float -> float = <fun>
Mirabile dictu! OCaml worked out all on its own that the function takes
two float
arguments and returns a float
.
How did it do this? Firstly it looks at where a
and b
are used,
namely in the expression (a +. b)
. Now, +.
is itself a function
which always takes two float
arguments, so by simple deduction, a
and b
must both also have type float
.
Secondly, the /.
function returns a float
, and this is the same as
the return value of the average
function, so average
must return a
float
. The conclusion is that average
has this type signature:
average : float -> float -> float
Type inference is obviously easy for such a short program, but it works
even for large programs, and it's a major time-saving feature because it
removes a whole class of errors which cause segfaults,
NullPointerException
s and ClassCastException
s in other languages (or
important but often ignored runtime warnings, as in Perl).