init-prog-ocaml.md 19.7 KB
Newer Older
1
2
 Introduction to Functional Programming in OCaml
================================================
3
4
5

**M2 LMFI**

6
## Task 0 : Log in !
7

Pierre Letouzey's avatar
Pierre Letouzey committed
8
Log in on some lab computer. Warning : the login to use is *not* directly your university credentials (`@u-paris.fr`), you need to visit first https://stp.math.univ-paris-diderot.fr/ to get a login specific to these lab rooms.
9

10
In case of trouble, there is a "guest" account without password, but with restricted web access later, ask me or a fellow at that moment.
11

Pierre Letouzey's avatar
Pierre Letouzey committed
12
13
Open a browser and retrieve this document via https://frama.link/prelmfi
or through https://www.irif.fr/users/letouzey/edu/proglmfi
14
15


16
## Day 1 : Kickstart
17

Pierre Letouzey's avatar
typo    
Pierre Letouzey committed
18
For today, a simplistic OCaml programming environment will be good enough : http://try.ocamlpro.com . No local installation required, directly try there an OCaml phrase suchandroid-studio-2020.3.1.24 as `let x = 1+2`. Also experiment with the "Editor" tab and its bottom buttons.
19

20
Of course, there are other OCaml environments that you could try later :
21

22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
 - Quite raw but still handy sometimes : the OCaml "toplevel" (interpréteur in French) can be launched in a Terminal window (a.k.a shell, or console) by typing `ocaml` in it. The OCaml prompt `#` appears, and you can then submit some OCaml phrase (here finished by the legacy `;;` terminator). `Ctrl+c` interrupts a running computation, `Ctrl+d` terminates the toplevel.

 - For more serious OCaml programming, favor a code editor such as `emacs` :
   - Launch emacs and open a new file such as `exo.ml` (be sure to use a name with the `.ml` extension otherwise emacs won't detect OCaml code).
   - In the `Tuareg` menu on the left, in submenu `Interactive Mode`, pick `Run OCaml Toplevel` and press Enter to confirm the proposed toplevel.
   - Any phrase written in `exo.ml` can then be evaluated by putting the cursor in it and typing `Ctrl+c Ctrl+e` (or `Evaluate phrase` in the previous submenu).

**Nota Bene** : no need anymore to use the old phrase terminator `;;` (except in direct use of the toplevel) as long as *all* your phrases start with `let`, even test phrases. So instead of `1+2;;` it is now recommanded to write something like `let x = 1+2` or `let _ = 1+2`.

OCaml provides many other tools for more advanced use : compilers (including native ones), debuggers, profilers, etc. But that's another story.


## The functional core of OCaml

OCaml directly derives from the lambda-calculus (due to Church in the 1930s) and its three basic constructions:
37
38

 - variables
39
40
41
42
43
44
45
46
 - function abstraction : `λx.t` (written `fun x -> t` in OCaml)
 - function application : `t u` (same in OCaml).
 
In other communities, the function application may be written `t(u)` (regular math style) or `(t)u` (Krivine style).
Here in OCaml and many other functional languages we favor a light style, with parenthesis only in case of ambiguity.

For instance `x y z` is a shortcut for `(x y) z` : one function `x` applied to two successive arguments `y`, then `z`.
And `x (y z)` is quite different : `y` will be a function receiving `z`, and the result is given to function `x`.
47

48
Important rule of thumb : in function applications, put parenthesis around every argument that isn't a variable or a constant.
49

50
Other important shortcut : `fun x y -> t` is `fun x -> fun y -> t`.
51

52
## Typing
53

54
By default, lambda-calculus is said *raw* or *pure* or *untyped* : you can try applying anything to anything and look if it breaks.
55

56
Some untyped programming languages based on λ : Lisp or Scheme. 
57

58
On the contrary, as most recent functional programming languages, OCaml is *strongly typed* : the non-well-typed terms will be rejected in OCaml during *type-checking*, before running any computation.
59

60
The main type constructor is the *functional arrow* `->`.
61

62
First, simply-typed lambda-calculus : just take lambda-calculus plus a typing judgement `Γ ⊢ t:T` described by the rules below. Here Γ is a *typing context*, i.e. a finite list of variables associated with their respective types.
63

64
65
66
 - If `x:T` is in Γ then `Γ ⊢ x:T`
 - If `Γ+x:τ ⊢ t:σ` then `Γ ⊢ (λx.t):(τ→σ)`
 - If `Γ ⊢ t:(τ→σ)` and `Γ ⊢ u:τ` then `Γ ⊢ (t u) : σ`
67

68
Note that by forgetting all the terms in the previous rules and keeping only the types (and contexts being hence just list of types), we recover the logical rules of the minimal logic : the axiom rule and the implication intro/elim rules. That's the start of the Curry-Howard isomorphism (more on that in other courses).
69

70
OCaml typing system is an extension of these basic rules, for now the main addition is a notion of type variables (`'a` in OCaml syntax) allowing for *polymorphism* (same generic code may be reused in similar situations).
71

72
We'll see soon that OCaml also provides way more than the basic lambda-calculus terms and types. In particular, unlike raw lambda-calculus, OCaml does provide primitive integers 0 1 2... -1 -2 ... of type `int` .
73

74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
#### Exercise : predict OCaml answers

```ocaml
let _ = fun x -> x
let _ = fun x -> y
let _ = fun x -> 2
let _ = (fun x -> x) 2
let _ = 2 3
let _ = (fun x -> x) 2 3
let _ = fun f -> f 2
let _ = fun f -> f 2 3
let _ = fun f -> f (f 2)
let _ = fun x -> x x
```

## Computations

Computation rule for λ-calculus : the β-reduction, where `(λx.t) u` gives `t{x:=u}` (for some reasonable definition of substitution).

In λ-calculus, this reduction may occur anywhere, in any order. 
Example of theoretical property of this reduction : the confluence (see for instance https://plfa.github.io/Confluence/ ).

Famous example of term whose reduction is infinite : `Δ Δ` with `Δ = λx.(x x)`. From that, *fixpoint combinators* can be created and used for defining recursive functions in raw λ-calculus (but that's quite tricky, see the Y combinator for instance).

Now, in simply typed λ-calculus, computations interact nicely with types :

 - The type is preserved during reduction (property known as "subject reduction").
 - Strong normalization : a well-typed term may not have infinite reduction (hence `Δ Δ` cannot be well-typed).
 - Strong normalization + confluence implies that reducing a well-typed term `t` as long as possible in any possible manner always ends on the unique normal form `t₀` of `t`.

OCaml has the same β-reduction rule, but in a controlled order named *call-by-value* (CBV, aka *eager* aka *strict*) : β-reduce only when argument `u` is already fully reduced. Moreover this reduction is *weak* : it never reduces inside a `fun`.

OCaml satisfies a subject reduction property, which is critical to exclude a whole family of runtime errors just by static typing (e.g. no "segfaults" in OCaml). Roughly, "if it types, it runs ok" (at least as long as you do not use *exceptions* or exception-aware functions).

Of course, no direct equivalent of `Δ Δ` or fixpoint combinators in OCaml due to typing. But OCaml is *not* strongly normalizing, on the contrary it provides a notion of general recursivity (`let rec`, see below). Actually, normalizing languages are too restrictive to be of general use, they are not *Turing-complete* (see Coq later).

## A first OCaml extension : let abbreviations

You can give a name `x` to a term `t`: 

 - Globally via `let x = t`. Here `x` can be used during the rest of the session.

 - Locally via `let x = t in u`. Here `x` can only be used inside `u` but not after.

Lexical scoping : at a given program point, only some abbreviations may be visible (early global ones or unfinished local ones).
If the same variable name appears in several `let`, the most recent abbreviation wins, the others are hidden. This way, you may appear to "change" an abbreviation, while `let` are actually *immutable*.

Note: `let x = t in u` may be emulated in λ-calculus through `(λx.u) t`. And for a finished program, you can turn all global `let` in local `let .. in`.

To name functions, either `let` followed by `fun` or use a dedicated syntax with parameters before the = sign :
124
125
126
127

```ocaml
let identity = fun x -> x
let identity x = x
128
129
130
131
132
```

More examples : function projections (which may actually encode pseudo-boolean values) :

```ocaml
133
134
135
136
137
let proj1 x y = x
let proj2 x y = y
let pseudoif b x y = b x y
```

138
#### Exercise : OCaml scoping
139

140
Predict OCaml answers after each phrase (`and` allows *simultaneous* abbreviations)
141
142
143
144
145
146
147
148
149
150
151

```ocaml
let x = 2
```

```ocaml
let x = 3 in
let y = x + 1 in
x + y
```

152
153
154
155
156
157
```ocaml
let x = 3 in
let y = let x = 4 in x + 1 in
x + y
```

158
159
160
161
162
163
164
165
166
167
168
169
170
171
```ocaml
let x = 3
and y = x + 1 in
x + y
```

```ocaml
let x = 3
let f y = y + x
let _ = f 2
let x = 0
let _ = f 2
```

172
#### Exercise : recover parenthesis
173

174
Question: add the needed parenthesis for this code to type-check and run :
175
176
177
178
179
180

```ocaml
let somme x y = x + y
let _ = somme somme somme 2 3 4 somme 2 somme 3 4
```

181
## First data types : boolean and integers
182

183
The raw λ-calculus allows to encode (i.e. emulate) all needed datatype (it is Turing-complete). But these encodings are complex, often inefficient and often incompatible with typing. So OCaml provides primitive data types and data structures instead.
184

185
186
First, the `bool` type and its `true` and `false` constants, and a primitive `if ... then ... else ...` construction.
Some boolean functions :
187

188
189
190
  - negation : `not`
  - logical "and" : `&&`
  - logical "or" : `||`
191

192
Note : evaluating `&&` or `||` is quite particular, it obeys *lazy* rules (Call-by-need) while the rest of OCaml is Call-by-value.
193

194
Boolean answers will typically be obtained after an equality test `x = y` or `x <> y` or a comparison `x < y` or `x <= y`, etc.
195

196
197
198
As already said, OCaml provides a type `int` of "machine" integers (hence with large but finite bounds, beware of overflows).
If needed, there exists a library of arbitrary size numbers.
Basic operations on integers : addition `+`, multiplication `*`, euclidean division `/`, modulo `x mod y`. Beware that `/` is a first example of function that may fail with an exception : `3 / 0`.
199
200


201
#### Exercise : Boolean functions
202

203
  - Write a function `checktauto : (bool->bool)->bool` which test whether a one-argument boolean function is a tautology, i.e. always answers `true`.
204

205
206
207
  - Same with `checktauto2` and `checktauto3` for functions with 2, then 3 arguments. This can be done by brute-force case enumeration, but there are clever ways to do it, for instance reuse `checktauto`.

  - What does the following function `f` ? Write an equivalent `g` function with just some `if` instead of `match`. Verify via `checktauto3` that `f` and `g` are indeed observationally equal.
208
209
210
211
212
213
214
215
216
217

```ocaml
let f x y z = match x, y, z with
  | _ , false , true -> 1
  | false , true , _ -> 2
  | _ , _ , false -> 3
  | _ , _ , true -> 4
```


218
## Recursivity 
219

220
The `let rec` construction allows to reuse in itself the function we're currently writing !
221

222
Beware of loops during evaluation. Check your halting case(s) (for instance when n is 0) and do recursive call on decreasing values.
223

224
#### Exercice : write some usual recursive functions
225

226
 - Factorial
227

228
 - Fibonacci numbers. Beware, the basic recursive function has an exponential behavior (why ?). How to avoid that ?
229

230
 - Gcd
231

232
 - Power function on integers. How to limit the number of multiplications done ?
233

234
## Higher-order functions and partial application
235

236
237
In functional programming, functions are first-class citizens. In particular, they may be passed as arguments of other functions,
or returned as output of a function call.
238

239
Example : the function composition
240
241
242
243
244

```ocaml
let compose f g = fun x -> f (g (x))
```

245
Or equivalently :
246
247
248
249
250

```ocaml
let compose f g x = f (g (x))
```

251
Note and understand the type inferred by OCaml : `('a->'b)->('c->'a)->'c->'b`
252

253
Moreover a function expecting several arguments may receive only some of them, leading to a function expecting the missing arguments. That's partial application.
254

255

256
#### Exercise : functions over functions
257

258
Question: write a `sigma` function such that `sigma f a b` computes the sum of all values returned by function `f` (of type `int->int`) between integers `a` et `b`. What is the type of `sigma` ?
259

260
Question: write a `iter` function such as `iter f n x` computes the n-th iterate of `f` at point `x`. What is the type of `iter` ? Use this `iter` to propose a new implementation of the power function on integers.
261

262
Question: devise a situation of over-application, i.e. a function receiving more arguments than initially expected
263
264


265
## Day 2 : OCaml data structures
266

Pierre Letouzey's avatar
Pierre Letouzey committed
267
268
269
270
271
272
273
274
275
276
277
278
279
## OCaml Pairs

A pair `(a,b)` regroups two elements `a` and `b`. These two elements may be of different types (or not).
The type of this pair is written `τ * σ` when types of `a` and `b` are respectively `τ` and `σ`.

OCaml also provides triplets, quadruplets, n-uplets, all built on the same principle, for instance `(1,2,true)` is a `int * int * bool`.

For using a pair, either project it (using predefined functions `fst` et `snd`),
or access all components at once (via a syntax such as `let (x,y) = ... in ...`).

Example : computing two following Fibonacci numbers efficiently.

```ocaml
Pierre Letouzey's avatar
Pierre Letouzey committed
280
let rec fibs n =
Pierre Letouzey's avatar
Pierre Letouzey committed
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
 if n = 0 then (1,1)
 else
    let (a,b) = fibs (n-1) in
    (b, a+b)
    
let fib n = fst (fibs n)
```

Note: the Curry-Howard isomorphism mentionned last week can be extended to pairs.
Actually the typing rules of pairs `(a,b)` and of projections `fst` and `snd` mimic the logical rules introducing and eliminating conjunctions.

#### Curried and uncurried functions.

If an OCaml function has several arguments, the habit is rather to separate these arguments as seen last week : `fun arg1 arg2 arg3 -> ...`.
This leads to a type of the form `typ_arg1 -> typ_arg2 -> ... -> typ_res`. This kind of function is said *curried* (after logician Haskell Curry).

But another style is to regroup all arguments in a n-uplet : `fun (arg1,arg2,...) -> ...`.
This leads to a type of the form `typ_arg1 * typ_arg2 * ... * typ_argn -> typ_res`. This function is said *uncurried*. In this case, the function cannot be partially applied easily. Conversely, it might sometimes be convenient to treat all these arguments as a unique one.

#### Exercise : let's curry and uncurry

Question : write a function `curry` which converts a function of type `'a*'b -> 'c` into a function of type `'a -> 'b -> 'c`. Write the converse `uncurry` function.

## OCaml Lists

An OCaml list is a structure regrouping many elements of the same type. Example: `[1;2;3;2]` is an `int list`.
A list is an ordered structure : the positions of elements in a list are relevant, `[2;1]` is not the same than `[1;2]` nor `[1;2;1]`.
A list may be of arbitrary length, starting from 0 (the empty list is `[]`). 

The basic constructors of lists are the empty list `[]` and the concatenation `x::l` of an extra element `x` on the left of an existing list `l`.
Note that the `[1;2;3;4]` syntax is just a shortcut for `1::(2::(3::(4::[])))`.

For analyzing a list, there exist predefined functions `List.hd` et `List.tl` (for *head* and *tail*), but favor instead a `match...with` syntax dealing both with the `[]` case and the `x::l` case.

The lists (just as pairs earlier, and trees below) are *immutables* structures : when a list has been built, its elements cannot been modified anymore (unlike for instance *arrays* in OCaml or other programming languages). But it possible to form new lists by reusing older ones (in whole or in right subparts).

The head of a list (its leftmost element) can be accessed in constant time, just as a `x::l` operation. On the contrary, the bottom of a list (its rightmost element) can only be accessed after visiting the whole list, hence in a linear time (proportional to the list length).

#### Exercise : retrieve some usual functions on lists

 - Recode a length function on lists (already provided by OCaml as `List.length`).

 - Recode the concatenation of two lists (already provided by OCaml as operator `@`).
 
 - Recode the concatenation of two lists that reverses the first one on the fly (cf `List.rev_append`). For instance `rev_append [1;2] [3;4] = [2;1;3;4]`.

Pierre Letouzey's avatar
typo    
Pierre Letouzey committed
327
 - Recode a reverse function on lists (a.k.a. mirror), cf. `List.rev`. If your solution has a quadratic complexity, try to find a linear solution.
Pierre Letouzey's avatar
Pierre Letouzey committed
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410

 - Recode the filter function `List.filter`. From a boolean test function `f` and a list `l`, `List.filter f l` keeps only the elements `x` of `l` such that `f x`.

 - Recode `List.map`, which apply a function `f` on all elements of a list.

Optionally, consult documentation for iterators `List.fold_left` and `List.fold_right` and try to recode them.

Beware, when using large lists (more than 30000 elements), some functions may fail with `Stack overflow` error. To avoid that, consider using the *tail rercusive* style (see for instance `rev_append` above) ... ou switch to some more clever structure (trees below, see also arrays).

## Trees 

No predefined trees in OCaml (too many variants would be interesting, depending on your intentions). Luckily, OCaml allows to easily define our own custom *algebraic types*.

For instance, here is a tree datatype with elements of type `'a` decoring each binary node:

```ocaml
type 'a tree =
 | Node of 'a * 'a tree * 'a tree
 | Leaf

let mytree = Node (1, Node (2,Leaf,Leaf), Leaf)
```

For analyzing a tree, once again a `match...with` syntax is very convenient
(or the `function` keyword, which amounts to a `fun` followed by a `match`).

#### Exercise : some usual functions on trees

 - Write the `size` and `depth` functions on trees
 - Write a `tree2list` function flattening a tree into the corresponding list. Also known as infix tree traversal.

When elements are sorted from left to right during an infix tree traversal, such trees are called Binary Search Trees (BST or "arbre binaire de recherche" (ABR) in French). Using BST may greatly improve the complexity of usual operations. 

#### Exercise : binary search trees

 - Write a `search` function checking whether some particular element is in a BST. Proceed by dichotomy.
 - Write a `is_bst` function checking whether a tree is indeed a BST. Which complexity could we attain ? 

If the tree is shallow (i.e. of low depth, i.e. complete or almost complete), the `search` function is logarithmic w.r.t. the tree size.
All the challenge is to keep BST trees as shallow as possible, even when adding extra elements in them. See for instance Red-Black trees or AVL trees. 

OCaml provides by default a library of efficient set structures and operations, cf. the `Set` module.

## OCaml Algebraic Types : design your own types

Besides trees, we can continue defining algebraic types fulfilling specific needs. For instance, here is a micro-language for symbolic computation:

```ocaml
type unop = Sin | Cos
type binop = Plus | Mult

type expr =
 | Cst of int
 | Var of string (* variable name *)
 | Unop of unop * expr
 | Binop of binop * expr * expr

(* encoding 3x+sin(y) : *)
let exemple = Binop(Plus,Binop(Mult,Cst 3,Var "x"),Unop(Sin,Var "y"))
```

Note that transforming a character string such as `"3x+sin(y)"` into the treeish form above (of type `expr`) could be done automatically, but that's far from obvious. That's a lexical and grammatical analysis, for which dedicated tools exists and are quite handy (cf `ocamllex` et `ocamlyacc`).

We could then perform various operations on expressions, such as simbolic simplification or derivation.

#### Exercise : simplification and formal derivation

  - Write a `simpl : expr->expr` function performing as much as possible basic algebraic rules such as `0+x = x`, at any place in the expression.
  - Write a `deriv : string->expr->expr` function computing a formal derivation with respect to a given variable name.

## OCaml is far more than that ...

OCaml is a full-fledge general-purpose language, so we've only seen a narrow part of it.
It also provides:

  - Exceptions (not only for errors, could be used as a programming style)
  - Modules for programming in the large
  - Primitives for imperative programming (`for`, `while`, references, mutable arrays, ...)
  - Primitives for interacting with the rest of the world (I/O, files, system, networks, ...)
  - Object-oriented programming (the "O" of OCaml...)
  - Lazy programming
  - ...