Home > General post > Visualizing 3d data – plotting quartiles separately

Visualizing 3d data – plotting quartiles separately

In this previous post, we’ve looked at displaying three dimensional data.  One major problem is when there is a high density of data, it can be difficult to see what’s going on in a 3 dimensional plot.

One way of looking at the data in more detail is to break it up.  Take a look at this graph:

This is a plot of data of air quality in Nottingham, UK, taken hourly in 2009 (the code to create it in base R is on the bottom of the page).  On the left is a scatterplot of NO2 against ozone (plot A).   The different colours indicate grouping the data by the level of ozone into quartiles.  On the right are plots of the NO vs NO2 for the same data, but a  separate plot for each quartile of the ozone data.  The points are all colour co-ordinated, so the red points indicating the upper quartile of the ozone data in plot A are matched by red points in plot B.

So you can see by comparing plot E and D, that at the lowest quartile of ozone levels, there is a greater spread of both NO2 and NO.

How this is done is pretty simple (most of the code is to make things vaguely pretty).  Essentially, the values for x,y and z are put into a matrix xyz.  The rows of the matrix are ordered according to the z variable.  The rows which deliniate each quartile are calculated, and then the plots for B to E of x vs y are drawn, using only the rows for that quartile.  The axes are plotted so that they are the same scale for each of the plots. There’s not much room for the axis labels – so these are added afterwards with the legend command.

Then on the left the plot for y (on the horizontal axis) and z (on the vertical axis) is drawn, with some added lines to show where the boundaries of each quartile lie.  The colours are stored in the xyz matrix in the col column.  Like most of my code, the graph is portable, you just need to input different values for x, y and z and re-label the names for each variable.  The original dataset is the same one which I have used for my previous posts.  It is from the UK airquality database.  If you copy this file into your working directory and run the code below, you’ll repeat the plot.

Any suggestions for improvements / comments would be most appreciated!

## name the columns of the data
columns <- c("date", "time", "NO", "NO_status", "NO_unit", "NO2",
	"NO2_status", "NO2_unit", "ozone", "ozone_status", "ozone_unit", 
	"SO2", "SO2_status", "SO2_unit")
## read in the data, store it in variable data
data <- read.csv("27899712853.csv", header = FALSE, skip = 7, 
	col.names = columns, stringsAsFactors = FALSE)

## now make the x,y and z variables

x <- data$NO
y <- data$NO2
z <- data$ozone
cols <- rep(1,length(z))

xyz <- cbind (x,y)
xyz <- cbind(xyz,z)
xyz <- cbind(xyz,cols)
colq1 <- 6
colq2 <- 4
colq3 <- 3
colq4 <- 2

xl <- "NO"
yl <- "NO2"
zl <- "Ozone"

point <- 20 

# re order by z
xyz <- xyz[order(xyz[,3]),]
# now define the row numbers for the quartile boundries
maxxyz <-  nrow(xyz)
q1xyz <- round(maxxyz/4)
medianxyz <-  round(maxxyz/2)
q3xyz <- round(maxxyz*3/4)

# assign colours to xyz$col
xyz[1:q1xyz,4] <- colq1
xyz[q1xyz:medianxyz,4] <- colq2
xyz[medianxyz:q3xyz,4] <- colq3
xyz[q3xyz:nrow(xyz),4] <- colq4

# define the maximum values for x,y, and z 
# these are used to ensure all the axes are the same scale
maxx <- x[which.max(x)]
maxy <- y[which.max(y)]
maxz <- z[which.max(z)]

# now make the plot
# first job is to save the graphics parameters currently used
def.par <- par(no.readonly = TRUE)
# define the margins around each plot
par("mar" = c(2,2,0.5,0.5))
# make the layout for the plot
layout(matrix(c(5,1,5,2,5,3,5,4), 4, 2, byrow = TRUE))

# now do the four plots on the right
plot(xyz[q3xyz:maxxyz,1],xyz[q3xyz:maxxyz,2], col = colq4, 
	xlab = xl, ylab = yl, pch=point, xlim = c(0,maxx), 
	ylim = c(0,maxy))
legend(x = "right", yl, bty = "n")
legend(x = "topright", "B", bty = "n")

plot(xyz[medianxyz:q3xyz,1],xyz[medianxyz:q3xyz,2], col = colq3,
	pch=point, xlim = c(0,maxx), ylim = c(0,maxy))
legend(x = "right", yl, bty = "n")
legend(x = "topright", "C", bty = "n")

plot(xyz[q1xyz:medianxyz,1],xyz[q1xyz:medianxyz,2], col = colq2, 
	pch=point, xlim = c(0,maxx), ylim = c(0,maxy))
legend(x = "right", yl, bty = "n")
legend(x = "topright", "D", bty= "n")

plot(xyz[0:q1xyz,1],xyz[0:q1xyz,2], col = colq1, pch=point, 
	xlim = c(0,maxx), ylim = c(0,maxy))
legend(x = "right", yl, bty = "n")
legend(x = "bottom", xl, bty = "n")
legend(x = "topright", "E", bty = "n")

# now do the plot on the left
plot(xyz[,2],xyz[,3], col = xyz[,4], pch=point, xlim = c(0,maxy))
legend(x = "bottom", yl, bty = "n")
legend(x = "right", zl, bty = "n")
legend(x = "topright", "A", bty = "n")


## reset the graphics display to default
Categories: General post Tags: ,
  1. Ben Bolker
    July 30, 2010 at 12:43 pm

    This seems like an awful lot of work to implement conditioning plots, which are implemented as (1) coplot() in base R; (2) the conditioning operator | in lattice graphics; (3) faceting in ggplot2 …

    • July 30, 2010 at 1:04 pm

      Well, because I coded it, it’s pretty easy to adjust the layout in anyway I want, and combine it with other graphs (e.g. the scatterplot with histograms...)
      So it is some work, but pretty flexible once done. And I know what’s going on “under the bonnet”…

  1. October 8, 2010 at 5:41 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: