Physics and Astronomy
Home Our Teaching Resources CDHW Statistics

# Introduction to Data Analysis and Statistics

## Introduction

This short course is part of the Core Training that must be completed by all research students in the School of Physics. A set of course notes will be issued during the lectures. This course is a broad-brush survey of some of the more common uses and abuses of statistics, and its purpose is to set the context within which specialised techniques exist.

## Problems

The techniques needed to tackle these problems are discussed in the printed notes which accompanied the lectures.

### 1. Sampling

In a strange species of animal, the female/male ration is 3/2 and CW071211-02.tab is a tab-delimited text file listing the heights of a random sample of 66 males and 99 females.

1. Estimate the mean height and its associated standard standard error by straightforward averaging using all the measurements in the file.
2. For each sex, use the first ten measurements to estimate the mean and standard deviation for the distribution. Hence design an optimal sampling strategy and use it to estimate the same parameters as in (a) using a total of no more than 50 values from the data set.

### 2. Confidence Intervals

The file CW031205-03.tab contains one hundred independent measurements of the same physical quantity which has a Gaussian distribution. Use:

1. only the first ten measurements,
2. all the measurements,

to estimate, and find (95%) confidence intervals for, the parameters describing the distribution as follows:

1. Given that the measurements come from a distribution of variance 5.29 estimate the mean of the distribution.
2. Given that the measurements come from a distribution of unknown variance estimate the mean of the distribution.
3. Estimate the variance of the distribution from the measurements.

### 3. Bayesian Inference

You are playing a game in which the object is to roll three six-sided dice, the highest score wins. Before the game you believe that there is a 3% chance that your opponent is a cheat. The first time you play, your score is 4-3-6, but your opponent's is 6-6-6. What is your revised opinion of your opponent?

Optional Question for film/musical buffs: In Guys and Dolls Act II Scene 3, if Nathan Detroit is an unwilling participant in a crap game in a sewer. If he initially estimates there is a 1% chance that Big Jule will cheat, how does he revise his estimate when Big Jule wins on both the first and second roll of his 'lucky' dice?

### 4. Maximum-Likelihood Estimators

By representing the estimated mean by the true mean plus a suitable random variable, show that the maximum likelihood estimator for the variance of an unknown Gaussian distribution (i.e. equation 7.12 in the notes) is biased.

### 5. Non-linear Fitting

The file CW071211-04.tab contains twenty independent measurements of the energy E of a spectral line against magnetic field B. The standard deviation associated with the energy at each field is also listed. Two theories have been proposed to fit the data, each has three parameters:

• Theory A:  $$E = p B + q \sin(B r)$$
• Theory B:  $$E = P B + Q B^2+R B^3$$

Use your favourite curve-fitting software to fit each theory to the data. Remember to find the covariance matrix in each case. If P is not equal to p explain why not. Can you decide which theory is correct? How accurate do you consider the parameter estimates to be? How accurate does the software you used imply they are?

### 6. That's All Folks

There is no question 6! Please hand in your attempts at the above exercises no later than Friday 17th February 2012. Put hand written answers in my pigeonhole, or, if they are in rtf or pdf, use ELE:

Let me know if you have diffculty with them. CDHW.