Skip to main content

Data Types And Pattern Matching

Course Videos​

Programming Languages CSE341 - Unit 2 (videos)

Compound Types​

  • Base types: int, real, bool, char, unit
  • Compound types: tuples, list, option, records, datatype
  • In any PL, there are 3 different kinds of compound types:
    1. "Each of" type: A compound type t contains values of each of types t1, t2, ... tn
    2. "One of" type: A compound type t containes values of one of types t1, t2, ... tn
    3. "Self Reference" type: A compound type t can refer to other values of type t
  • Examples:
    • Tuples are each-of types e.g. int * bool contains an int and a bool
    • Options are one-of types e.g. int option contains an int or no data
    • Lists are all 3 kinds of types e.g. int list contains an int and another list (self reference) or it contains empty list
    • OOP does one-of types using subclassing and subtypes which is exact opposite of how FP does it

Records​

  • A way of making custom "Each-of" type
  • Syntax: val rec = { field1 = expr1, field2 = expr2, ... };
  • Semantics:
    • Type checking: Determine type of each expression and extend static env by type { field1 : t1, field2 : t2, ...}
    • Evaluation: Evaluate each expr and extend dynamic env by value { field1 = v1, field2 = v2, ...}

Tuples as Syntactic Sugar​

  • Tuple is a syntactic sugar for records which has fields as numbers from 1 ... n that are actually positions
  • Acceessing first element in tuple, #1 some_tuple and some field1 in record #field1 some_record
  • There is no such thing as tuple in ML, it is just a record with field names as numbers e.g. var some_tuple = (true, "hi"); is same as val some_record = { 1=true, 2="hi" };
  • Tuple syntax is just a syntactic sugar for writing the above special record with fields as position numbers
  • (e1, e2, ..., en) is another way of writing {1=e1, 2=e2, ..., n=en}
  • Similarly, type t1 * t2 ... * tn is same as { 1:t1, 2:t2, ..., n:tn }

Datatype Bindings​

  • A way of making custom "One-of" type
datatype mytype = TwoInts of int * int
| Str of string
| Pizza
(* all below a,b,c are of type: mytype *)
(* values of a, b, c contain constructors *)
(* constructors are like tags *)
val a = Str "hi";
val b = TwoInts (2, 3);
val c = Pizza;
  • The above example adds a new type named mytype to the environment
  • Also adds three constructors to the environment TwoInts, Str, Pizza
  • Constructor is a function which makes values of this new type mytype
    • TwoInts : int * int -> mytype
    • Str : string -> mytype
    • Pizza : mytype

Case Expression​

  • Syntax:
case e0 of
p1 => e1
| p2 => e2
...
| pn => en
  • Semantics:
    • Type checking: e1, e2 ..., en need to have same type which is type of whole case expression
    • Evaluation: Evaluate e0 and match against a pattern and for matching pattern evaluate the corresponding expression and return the result
  • Example:
fun f x = (* f has type mytype -> int *)
case x of
Pizza => 3
| TwoInts(i1,i2) => i1 + i2
| Str s => String.size s
  • Each branch has the form p => e where p is a pattern and e is an expression
  • We separate the branches with the | character
  • Patterns look like expressions, but do not think of them as expressions
  • Instead they are used to match against the result of evaluating the case’s first expression (the part after case)
  • This is why evaluating a case-expression is called pattern-matching

This is an example of a data definition for arithmetic expressions containing constants, negations, additions, and multiplications.

datatype exp = Constant of int
| Negate of exp
| Add of exp * exp
| Multiply of exp * exp
  • Thanks to the self-reference, what this data definition really describes is trees
  • Where the leaves are integers and the internal nodes are either negations with one child, additions with two children or multiplications with two children
  • We can write a function that takes an exp and evaluates it:
fun eval e =
case e of
Constant i => i
| Negate e2 => ~ (eval e2)
| Add(e1,e2) => (eval e1) + (eval e2)
| Multiply(e1,e2) => (eval e1) * (eval e2)

(* this function call evaluates to 15 *)
eval (Add (Constant 19, Negate (Constant 4)))
  • We can summarize what we know about datatypes and pattern matching so far as follows:
  • The binding datatype t = C1 of t1 | C2 of t2 | ... | Cn of tn introduces a new type t and each constructor Ci is a function of type ti->t
  • One omits the “of ti” for a variant that “carries nothing” and such a constructor just has type t
  • To “get at the pieces” of a t we use a case expression: case e of p1 => e1 | p2 => e2 | ... | pn => en
  • A case expression evaluates e to a value v, finds the first pattern pi that matches v, and evaluates ei to produce the result for the whole case expression

Type Synonyms​

A type synonym simply creates another name for an existing type that is entirely interchangeable with the existing type.

(* datatype bindings extend static env with new types and constructors *)
datatype suit = Club | Diamond | Heart | Spade
datatype rank = Jack | Queen | King | Ace | Num of int

(* type synonym is just interchangeable name for an already existing type *)
type card = suit * rank
type name_record = { student_num : int option,
first : string,
middle : string option,
last : string }

Lists and Options are Datatypes​

  • Lists and options are just “built in” (i.e., predefined with some special syntactic support) datatypes.
  • For options, all you need to know is SOME and NONE are constructors, which we use to create values (just like before) and in patterns to access the values
fun inc_or_zero intoption =
case intoption of
NONE => 0
| SOME i => i+1
  • The story for lists is similar with a few convenient syntactic peculiarities: [] really is a constructor that carries nothing and :: really is a constructor that carries two things, but :: is unusual because it is an infix operator (it is placed between its two operands), both when creating things and in patterns
(* here x and xs’ are nothing but local variables introduced via pattern-matching *)
fun sum_list xs =
case xs of
[] => 0
| x::xs’ => x + sum_list xs’

fun append (xs,ys) =
case xs of
[] => ys
| x::xs’ => x :: append(xs’,ys)