# Data analysis

Queensland Government Statistician's Office (QGSO) can conduct a range of simple and complex analyses of data to help explain what statistics actually say. This provides clients with better information to support and inform planning, decision making and policy development. The types of analyses we can provide are described below.

## Experimental design and sample size selection

Using the correct statistical design in conducting a study or survey is essential in producing unbiased results. Sample selection is based on simple random sampling. Following appropriate statistical principles will ensure that the subsequent analysis is valid and avoids bias. Power analysis uses knowledge of previous studies’ variability and expected effects to determine sample size. The size of the study group is set to obtain a particular level of precision and is a useful tool in planning.

## Exploratory data analysis

Following on from the initial checking, cleaning and coding of data, there are three aspects of exploratory analysis of variables:

- measures of the magnitude of the data (e.g. means, medians)
- measures of the variability of the data (e.g. ranges, extreme values)
- the use of graphical methods (e.g. frequency distributions, box plots, scatter plots). The plotting of frequency distributions and box plots highlights outliers and extreme data points. A scatter plot may suggest associations of two variables and show outliers.

Further analysis will depend on the nature of the data, whether they are normally distributed or skewed. Exploratory data analysis will indicate which further analytical techniques are appropriate for the particular dataset.

## Statistical analysis

Statistical methods are tools used to make sense of the data and to summarise the results. Univariate techniques such as t-tests, analysis of variance and nonparametric t-tests are used to examine whether there are statistical differences between groups. Correlation and regression techniques examine the associations between variables. Statistical models such as logistic regression, generalised models, risk trees and Bayesian models allow further analysis.

## Multivariate analysis

In a situation where there are several variables measured on an individual or experimental unit, techniques of multivariate statistics are used. For example, data are summarised and a large set of variables reduced to a fewer number of principal components or factor scores by principal component analysis or factor analysis. Techniques such as cluster analysis distinguish between groups of individual units.

## Spatial analysis

Some data have a spatial component; that is a variable with a geographic reference, for example environmental variables, or the spread of a disease from a centre. Rates of incidence are mapped to various geographic regions. The scale of the data, the variability in space and time and the cut-points between categories all contribute to the quality of the resultant statistics.

## Data investigation

QGSO uses sophisticated and powerful data investigation techniques for discovering relationships in data that would otherwise go undetected. When analysing large databases, the utility of traditional statistical analytical methods is often limited. Patterns that the user expects to find may be visible, with other relationships going undetected. These techniques make no theoretical assumptions about the data, and can therefore be interpreted by reference to existing theory relevant to the data in question. These techniques have previously been used by QGSO to detect spatial patterns of offending in crime data, and have many other applications.

Last reviewed 4 March 2015