"… a branch of artificial intelligence, concerns the construction and study of systems that can learn from data." ~ Wikipedia
The R Language
foo = data.frame()
> foo$bar
NULL
Vectors are created using the `c` operator, which is apparently slang for "combine":
> c(1,2,3,4) [1] 1 2 3 4
But, "All arguments are coerced to a common type which is the type of the returned value."
> c(1,2,3,4,'lolwut') [1] "1" "2" "3" "4" "lolwut"
echo 'x = function(_) { 42 }' > /tmp/foo
> source('/tmp/foo') *** caught segfault *** address 0x100, cause 'memory not mapped' Trace back: 1: source("/tmp/foo") Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace
> foo <- c(1,2,3,4) > foo[0] <- 10 > foo [1] 1 2 3 4
> foo <- c(1,2,3,4) > foo[8] <- 10 > foo [1] 1 2 3 4 NA NA NA 10
Overarching Problem: No Static Type System
Overarching Solution: Typed Racket + the Math Library
> (require math) > (pdf (normal-dist 0 1) 0) 0.39894228040143265 > (pdf (beta-dist 2 3) .8) 0.38399999999999995
Awesome!
> (require math) ... > (optim initial-values objective-function gradient) optim: undefined; cannot reference undefined identifier context...: /Applications/Racket v5.90.0.10/collects/racket/private/misc.rkt:87:7
> (require math) ... > (optimize initial-values objective-function gradient) optimize: undefined; cannot reference undefined identifier context...: /Applications/Racket v5.90.0.10/collects/racket/private/misc.rkt:87:7
> (require math) > (pdf (multivariate-normal-dist 0 1) 0) multivariate-normal-dist: undefined; cannot reference undefined identifier context...: /Applications/Racket v5.90.0.10/collects/racket/private/misc.rkt:87:7
> (require math) > (pdf (dirichlet-dist (vector 1 2 3 4)) 0) dirichlet-dist: undefined; cannot reference undefined identifier context...: /Applications/Racket v5.90.0.10/collects/racket/private/misc.rkt:87:7
Dammit.
> (histogram data-vector) histogram: undefined; cannot reference undefined identifier context...: /Applications/Racket v5.90.0.10/collects/racket/private/misc.rkt:87:7
Ugh.
Visualizing data without histograms is hard.
I'm building a Racket library of machine learning related things.
#lang typed/racket (require racket-ml plot/typed math ) (plot (hist-gen&render (sample (normal-dist 0 1) 1000) 30))
MVN and Dirichlet
The Cholesky decomposition of a matrix is one definition of the notion of a "square root" operation on matrices.
\begin{equation} L = \mathrm{cholesky}(A) \text{ s.t. } A = L L^T \end{equation}The Gaussian process is a distribution over functions.
\begin{equation} f(\vec{x}) \sim \mathcal{GP}(m(\mathbb{X}), \kappa(\mathbb{X}, \mathbb{X}^{\prime})) \end{equation}Where \(\mathbb{X}\) is the training data
The Math and Plot libraries handle all of this with ease
emacs vs vim
holy war
What matters is that when doing machine learning I talk about posteriors, priors, and likelihoods.
We seek the posterior, which is a distribution over models.
If we want to predict new data (e.g. predict the weather tomorrow), we ask each model in the posterior what it thinks the new data should be.
For old ("training") data D and new data D'i which is predicted by model Mi:
\(p(D'_i \,|\, D) = p(D'_i \,|\, M_i)p(M_i \,|\, D)\)
This is the likelihood that the model explains our data. For linear regression, we could use the familiar least squares method.
Our model in this case is \(y = m * x + b\) where the model parameters are \(m\) and \(b\).
The prior encodes subjective knowledge about the world.
Priors are often used to combat over-fitting.
"… the process of drawing conclusions from data that are subject to random variation" ~ Wikipedia
Take a load of data push it through an algorithm to produce a model of the process that produced the data.