This part of the manual is a tutorial introduction to the OCaml language. A good familiarity with programming in a conventional languages (say, Pascal or C) is assumed, but no prior exposure to functional languages is required. The present chapter introduces the core language. Chapter 2 deals with the module system, chapter 3 with the object-oriented features, chapter 4 with extensions to the core language (labeled arguments and polymorphic variants), and chapter 6 gives some advanced examples.
For this overview of OCaml, we use the interactive system, which is started by running ocaml from the Unix shell, or by launching the OCamlwin.exe application under Windows. This tutorial is presented as the transcript of a session with the interactive system: lines starting with # represent user input; the system responses are printed below, without a leading #.
Under the interactive system, the user types OCaml phrases terminated by ;; in response to the # prompt, and the system compiles them on the fly, executes them, and prints the outcome of evaluation. Phrases are either simple expressions, or let definitions of identifiers (either values or functions).
The OCaml system computes both the value and the type for each phrase. Even function parameters need no explicit type declaration: the system infers their types from their usage in the function. Notice also that integers and floating-point numbers are distinct types, with distinct operators: + and * operate on integers, but +. and *. operate on floats.
Recursive functions are defined with the let rec binding:
In addition to integers and floating-point numbers, OCaml offers the usual basic data types: booleans, characters, and immutable character strings.
Predefined data structures include tuples, arrays, and lists. General mechanisms for defining your own data structures are also provided. They will be covered in more details later; for now, we concentrate on lists. Lists are either given in extension as a bracketed list of semicolon-separated elements, or built from the empty list [] (pronounce “nil”) by adding elements in front using the :: (“cons”) operator.
As with all other OCaml data structures, lists do not need to be explicitly allocated and deallocated from memory: all memory management is entirely automatic in OCaml. Similarly, there is no explicit handling of pointers: the OCaml compiler silently introduces pointers where necessary.
As with most OCaml data structures, inspecting and destructuring lists is performed by pattern-matching. List patterns have the exact same shape as list expressions, with identifier representing unspecified parts of the list. As an example, here is insertion sort on a list:
The type inferred for sort, 'a list -> 'a list, means that sort can actually apply to lists of any type, and returns a list of the same type. The type 'a is a type variable, and stands for any given type. The reason why sort can apply to lists of any type is that the comparisons (=, <=, etc.) are polymorphic in OCaml: they operate between any two values of the same type. This makes sort itself polymorphic over all list types.
The sort function above does not modify its input list: it builds and returns a new list containing the same elements as the input list, in ascending order. There is actually no way in OCaml to modify in-place a list once it is built: we say that lists are immutable data structures. Most OCaml data structures are immutable, but a few (most notably arrays) are mutable, meaning that they can be modified in-place at any time.
OCaml is a functional language: functions in the full mathematical sense are supported and can be passed around freely just as any other piece of data. For instance, here is a deriv function that takes any float function as argument and returns an approximation of its derivative function:
Even function composition is definable:
Functions that take other functions as arguments are called “functionals”, or “higher-order functions”. Functionals are especially useful to provide iterators or similar generic operations over a data structure. For instance, the standard OCaml library provides a List.map functional that applies a given function to each element of a list, and returns the list of the results:
This functional, along with a number of other list and array functionals, is predefined because it is often useful, but there is nothing magic with it: it can easily be defined as follows.
User-defined data structures include records and variants. Both are defined with the type declaration. Here, we declare a record type to represent rational numbers.
Record fields can also be accessed through pattern-matching:
Since there is only one case in this pattern matching, it is safe to expand directly the argument r in a record pattern:
Unneeded fields can be omitted:
Optionally, missing fields can be made explicit by ending the list of fields with a trailing wildcard _::
When both sides of the = sign are the same, it is possible to avoid repeating the field name by eliding the =field part:
This short notation for fields also works when constructing records:
At last, it is possible to update few fields of a record at once:
With this functional update notation, the record on the left-hand side of with is copied except for the fields on the right-hand side which are updated.
The declaration of a variant type lists all possible shapes for values of that type. Each case is identified by a name, called a constructor, which serves both for constructing values of the variant type and inspecting them by pattern-matching. Constructor names are capitalized to distinguish them from variable names (which must start with a lowercase letter). For instance, here is a variant type for doing mixed arithmetic (integers and floats):
This declaration expresses that a value of type number is either an integer, a floating-point number, or the constant Error representing the result of an invalid operation (e.g. a division by zero).
Enumerated types are a special case of variant types, where all alternatives are constants:
To define arithmetic operations for the number type, we use pattern-matching on the two numbers involved:
Another interesting example of variant type is the built-in 'a option type which represents either a value of type 'a or an absence of value:
This type is particularly useful when defining function that can fail in common situations, for instance
The most common usage of variant types is to describe recursive data structures. Consider for example the type of binary trees:
This definition reads as follows: a binary tree containing values of type 'a (an arbitrary type) is either empty, or is a node containing one value of type 'a and two subtrees containing also values of type 'a, that is, two 'a btree.
Operations on binary trees are naturally expressed as recursive functions following the same structure as the type definition itself. For instance, here are functions performing lookup and insertion in ordered binary trees (elements increase from left to right):
Though all examples so far were written in purely applicative style, OCaml is also equipped with full imperative features. This includes the usual while and for loops, as well as mutable data structures such as arrays. Arrays are either given in extension between [| and |] brackets, or allocated and initialized with the Array.make function, then filled up later by assignments. For instance, the function below sums two vectors (represented as float arrays) componentwise.
Record fields can also be modified by assignment, provided they are declared mutable in the definition of the record type:
OCaml has no built-in notion of variable – identifiers whose current value can be changed by assignment. (The let binding is not an assignment, it introduces a new identifier with a new scope.) However, the standard library provides references, which are mutable indirection cells (or one-element arrays), with operators ! to fetch the current contents of the reference and := to assign the contents. Variables can then be emulated by let-binding a reference. For instance, here is an in-place insertion sort over arrays:
References are also useful to write functions that maintain a current state between two calls to the function. For instance, the following pseudo-random number generator keeps the last returned number in a reference:
Again, there is nothing magical with references: they are implemented as a single-field mutable record, as follows.
In some special cases, you may need to store a polymorphic function in a data structure, keeping its polymorphism. Without user-provided type annotations, this is not allowed, as polymorphism is only introduced on a global level. However, you can give explicitly polymorphic types to record fields.
OCaml provides exceptions for signalling and handling exceptional conditions. Exceptions can also be used as a general-purpose non-local control structure. Exceptions are declared with the exception construct, and signalled with the raise operator. For instance, the function below for taking the head of a list uses an exception to signal the case where an empty list is given.
Exceptions are used throughout the standard library to signal cases where the library functions cannot complete normally. For instance, the List.assoc function, which returns the data associated with a given key in a list of (key, data) pairs, raises the predefined exception Not_found when the key does not appear in the list:
Exceptions can be trapped with the try…with construct:
The with part is actually a regular pattern-matching on the exception value. Thus, several exceptions can be caught by one try…with construct. Also, finalization can be performed by trapping all exceptions, performing the finalization, then raising again the exception:
We finish this introduction with a more complete example representative of the use of OCaml for symbolic processing: formal manipulations of arithmetic expressions containing variables. The following variant type describes the expressions we shall manipulate:
We first define a function to evaluate an expression given an environment that maps variable names to their values. For simplicity, the environment is represented as an association list.
Now for a real symbolic processing, we define the derivative of an expression with respect to a variable dv:
As shown in the examples above, the internal representation (also called abstract syntax) of expressions quickly becomes hard to read and write as the expressions get larger. We need a printer and a parser to go back and forth between the abstract syntax and the concrete syntax, which in the case of expressions is the familiar algebraic notation (e.g. 2*x+1).
For the printing function, we take into account the usual precedence rules (i.e. * binds tighter than +) to avoid printing unnecessary parentheses. To this end, we maintain the current operator precedence and print parentheses around an operator only if its precedence is less than the current precedence.
All examples given so far were executed under the interactive system. OCaml code can also be compiled separately and executed non-interactively using the batch compilers ocamlc and ocamlopt. The source code must be put in a file with extension .ml. It consists of a sequence of phrases, which will be evaluated at runtime in their order of appearance in the source file. Unlike in interactive mode, types and values are not printed automatically; the program must call printing functions explicitly to produce some output. Here is a sample standalone program to print Fibonacci numbers:
(* File fib.ml *) let rec fib n = if n < 2 then 1 else fib (n-1) + fib (n-2);; let main () = let arg = int_of_string Sys.argv.(1) in print_int (fib arg); print_newline (); exit 0;; main ();;
Sys.argv is an array of strings containing the command-line parameters. Sys.argv.(1) is thus the first command-line parameter. The program above is compiled and executed with the following shell commands:
$ ocamlc -o fib fib.ml $ ./fib 10 89 $ ./fib 20 10946
More complex standalone OCaml programs are typically composed of multiple source files, and can link with precompiled libraries. Chapters 9 and 12 explain how to use the batch compilers ocamlc and ocamlopt. Recompilation of multi-file OCaml projects can be automated using third-party build systems, such as the ocamlbuild compilation manager.