"… a branch of artificial intelligence, concerns the construction and study of systems that can learn from data." ~ Wikipedia

The R Language

```
foo = data.frame()
> foo$bar
NULL
```

Vectors are created using the `c` operator, which is apparently slang for "combine":

> c(1,2,3,4) [1] 1 2 3 4

But, "All arguments are coerced to a common type which is the type of the returned value."

> c(1,2,3,4,'lolwut') [1] "1" "2" "3" "4" "lolwut"

echo 'x = function(_) { 42 }' > /tmp/foo

> source('/tmp/foo') *** caught segfault *** address 0x100, cause 'memory not mapped' Trace back: 1: source("/tmp/foo") Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

> foo <- c(1,2,3,4) > foo[0] <- 10 > foo [1] 1 2 3 4

> foo <- c(1,2,3,4) > foo[8] <- 10 > foo [1] 1 2 3 4 NA NA NA 10

Overarching Problem: No Static Type System

Overarching Solution: Typed Racket + the Math Library

> (require math) > (pdf (normal-dist 0 1) 0) 0.39894228040143265 > (pdf (beta-dist 2 3) .8) 0.38399999999999995

Awesome!

> (require math) ... > (optim initial-values objective-function gradient) optim: undefined; cannot reference undefined identifier context...: /Applications/Racket v5.90.0.10/collects/racket/private/misc.rkt:87:7

> (require math) ... > (optimize initial-values objective-function gradient) optimize: undefined; cannot reference undefined identifier context...: /Applications/Racket v5.90.0.10/collects/racket/private/misc.rkt:87:7

> (require math) > (pdf (multivariate-normal-dist 0 1) 0) multivariate-normal-dist: undefined; cannot reference undefined identifier context...: /Applications/Racket v5.90.0.10/collects/racket/private/misc.rkt:87:7

> (require math) > (pdf (dirichlet-dist (vector 1 2 3 4)) 0) dirichlet-dist: undefined; cannot reference undefined identifier context...: /Applications/Racket v5.90.0.10/collects/racket/private/misc.rkt:87:7

Dammit.

> (histogram data-vector) histogram: undefined; cannot reference undefined identifier context...: /Applications/Racket v5.90.0.10/collects/racket/private/misc.rkt:87:7

Ugh.

Visualizing data without histograms is hard.

I'm building a Racket library of machine learning related things.

#lang typed/racket (require racket-ml plot/typed math ) (plot (hist-gen&render (sample (normal-dist 0 1) 1000) 30))

MVN and Dirichlet

The Cholesky decomposition of a matrix is one definition of the notion of a "square root" operation on matrices.

\begin{equation} L = \mathrm{cholesky}(A) \text{ s.t. } A = L L^T \end{equation}The Gaussian process is a distribution over functions.

\begin{equation} f(\vec{x}) \sim \mathcal{GP}(m(\mathbb{X}), \kappa(\mathbb{X}, \mathbb{X}^{\prime})) \end{equation}Where \(\mathbb{X}\) is the training data

The Math and Plot libraries handle all of this with ease

emacs vs vim

holy war

What matters is that when doing machine learning I talk about posteriors, priors, and likelihoods.

- \(p(M|D)\) is called the posterior
- \(p(D|M)\) is called the likelihood of the data
- \(p(M)\) is called the prior

We seek the posterior, which is a distribution over models.

If we want to predict new data (e.g. predict the weather tomorrow), we ask each model in the posterior what it thinks the new data should be.

For old ("training") data D and new data D'_{i} which is predicted by model M_{i}:

\(p(D'_i \,|\, D) = p(D'_i \,|\, M_i)p(M_i \,|\, D)\)

This is the likelihood that the model explains our data. For linear regression, we could use the familiar least squares method.

Our model in this case is \(y = m * x + b\) where the model parameters are \(m\) and \(b\).

The prior encodes subjective knowledge about the world.

Priors are often used to combat over-fitting.

"… the process of drawing conclusions from data that are subject to random variation" ~ Wikipedia

Take a load of data push it through an algorithm to produce a model of the process that produced the data.