- PhD in statistics, from QUT & Paris-Dauphine
- Honours in Bioinformatics (Griffith), BSc in Statistics (Otago)

- I now live in Brisbane, by way of a few places

2016-02-18

- PhD in statistics, from QUT & Paris-Dauphine
- Honours in Bioinformatics (Griffith), BSc in Statistics (Otago)

- I now live in Brisbane, by way of a few places

*Key areas*- Bayesian statistics
- Mixture and hidden Markov models,
- Bio-statistics/informatics/security,

*Research interests*- Data driven, accessible, intuitive tools.
**Making data analysis easier!**

The most common question asked since I started to pursue Statistics has been:

**"Why…?"**

I can share my top three reasons!

- A sense of urgency,
- tantalizing hope, and
- boundless excitement.

The exponential growth of computing is not slowing down!

It is notoriously hard for our brain to really comprehend what this means.

- If we symbolize ALL of our computational advances to date by this dot \(\rightarrow \cdot\)

- In 10 years this is what we will be dealing with:

Opinions are changing fast, and everyone is coming on-board!

- There are low hanging fruits to make better, easier tools.

**the traditional way**: adapt asymptotic theory to realistic sample sizes.**the future?**: take advantage of the*features*of Big Data (i.e. closer to the underlying truth).

Amazing things happen when data analysis combines clear research questions, appropriate data, and suitable, accessible tools.

**Accessibility**: easy to use, AND understand what the tool does.- Simpler models lead to fewer mistakes.
- People will surprise you, if allowed.

- It doesn't have to be just "analysis"! It can be exploration, discovery, and more than a little exciting.

Alzheimer's Disease (AD) currently affects over 342,800 Australians, and this number is expected to rise to 900,000 by 2050.

Cognitive changes occur **very late** in the disease (\(\geq\) 20 years).

During this time, AD causes irreversible damage to the brain!

- Best detection tool we have: imaging of
**amyloid \(\beta\)**

SUVR available for 393 individuals: **290 HC, and 103 AD**

Originally, **compared AD to HC**, and so on…

*But something quite interesting is happening here.*

**We expect some HC's have early stage AD.**

This means the HC data must contain a **mixture** of individuals, HC and not.

But if different subgroups exist, can't compare AD to HC!

*Undetected subgroups can cause problems…*

We have a *mixture distribution with an unknown number of groups*.

Traditionally, these can be quite painful!

- Much easier to just
**include too many groups**in one mixture model.

- Can use prior to tell model what to do with unnecessary groups.

- Model + computational tools available in R package
**Zmix**.

- Based on recent developments in Bayesian
*asymptotic*theory.

Install the package

devtools::install_github('zoevanhavre/Zmix') # Thank you Hadley! library(Zmix)

Run the model (with 5 groups)

Zmix.Y<-Zmix_univ_tempered (Y, iter=50000, k=5)

Process the results

Proc.Zmix.Y<-Process_Output_Zmix(Zmix.Y, Burn=25000)