# Bubble tables with R

## Why?

The bubble plot strips the data away from tables to try and show patterns, in a context where the underlying numbers aren’t that important.

## Simple version

The example data here are the percentage of people that are fellows, postgrads, undergrads, female or male for all the Cambridge colleges.

All the data is copy and pasted from this site^{1}, except Hughes, which I got from their website.

So first I’ll pull the data from a gist I created for the wine spend post.^{2}

```
# Get the data from my gist on github
# note: readcsv() wasn't supporting ssl when this was written, which is
# why I do the convuluted method
temporaryFile <- tempfile()
download.file("https://gist.githubusercontent.com/epijim/8819934/raw/6c76df80eb095065a9ce0fa4b8f94410ad528fed/college_data.csv"
,destfile=temporaryFile, method="curl")
mydata <- read.csv(temporaryFile)
```

This data doesn’t suit a bubble plot particulary well. I’m only using it here as I always prefer using data that is interesting to me, rather than one of the datasets built into R.

So for each college I’ll convert the count of individuals in each common room, and the count of each gender, into a percentage.

I also pull out the college list. I’m trying to use the base R `reshape`

here, as I’m terrible with it,
but if I was using `reShape`

from the `Hmisc`

package I would just pull colleges names across as a
non-time-varying variable.

```
# Get percentage in each common room
mydata$Total <- mydata$Fellows + mydata$Students
mydata$p_under <- round((mydata$Students - mydata$Grads)*100 / mydata$Total, digits=0) # all rounded
mydata$p_post <- round(mydata$Grads*100 / mydata$Total, digits=0)
mydata$p_fellow <- round(mydata$Fellows*100 / mydata$Total, digits=0)
# Extract college for later
college_list <- as.character(mydata$College)
```

Now reshaping the dataset - as I mentioned before, this code is shoddy. `reShape`

is my usual go to. Right now
I am in a plane on the way back from a conference though, so I can’t search stackoverflow for some help on
how to use `reshape`

.

```
# reshape for plot - reShape (from hmisc) is a lot easier to use, but this is my horrible
# base R attempt at reshape...
mydata <- mydata[,c("Percent_male","Percent_female","p_under", "p_post","p_fellow")]
mydata_long <- reshape(mydata, idvar = "College_id", ids = row.names(mydata),
times = names(mydata), timevar = "Characteristic",
varying = list(names(mydata)), direction = "long")
```

So for the bubble plot, I need x and y values to be the index, saying where the bubble appears. I’m stripping out the labels here, and then I add them back in when I set up the plot. This seemed the easiest way to me, but I need to come back to this at some point as I shouldn’t need to do this.

```
mydata_long$x <- as.numeric(as.factor(mydata_long$Characteristic))
mydata_long$y <- as.numeric(mydata_long$College_id)
library(ggplot2)
# ggplot2 base layer
g <- ggplot(mydata_long)
# Bubble plots - edit limits and seq based on your data
(g + geom_point(aes(x = x, y = y,
size = Percent_male, colour = Percent_male),
shape=16, alpha=0.80) +
scale_colour_gradient(limits = c(0, 100), low="blue", high="red", breaks= seq(0, 100, by = 10)) +
scale_y_continuous(breaks = 1:31,
labels=college_list)+
scale_x_continuous(breaks = 1:5,
labels=c("Fellow", "Postgrad", "Undergrad","Female","Male")) +
labs(size= "Percent",
colour="Percent",
x = "Percent",
y = "College",
title = "Bubbleplot of percent male/female and for each common room")
)
```

## With numbers in the circles

The version of the plot below is based on code that was suggested by Didzis Elferts on stackoverflow^{3}.

```
# other way
mydata_long$radius <- sqrt( mydata_long$Percent_male / pi )
#You should play around with size of the points and text labels inside the plot to perfect fit.
ggplot(mydata_long,aes(x,y))+
geom_point(aes(size=radius*3), # adjust the 1.5 to suit
shape=21,fill="white")+
geom_text(aes(label=Percent_male),size=4)+
scale_size_identity()+
scale_x_continuous(breaks = 1:5,
labels=c("Fellow", "Postgrad", "Undergrad","Female","Male"))+
scale_y_continuous(breaks = 1:31,
labels=college_list)+
coord_fixed(ratio=0.4)+
theme(panel.grid.major=element_line(linetype=2,color="black"),
axis.text.x=element_text(angle=90,hjust=1,vjust=0))+
labs(
x = "Percent",
y = "College",
title = "Bubbleplot of percent male/female and for each common room")
```