Home > Geek stuff > Power plotting with ggplot2

Power plotting with ggplot2

September 21st, 2009

The importance of good data visualization is hard to overstate. However, as soon as you start to get multiple related datasets with many different variables, making useful graphics becomes a major challenge. Excel is of-course a no go for biogeeks and usually we and most of our geeks-colleagues around the world turn to R. However, R’s basic graphics capabilities is not very well suited for organizing and making lots of plots at the same time.

Here is a pharmacology example: Say you want to make 12 dose-response plots of various compounds tested in various cell lines. With basic R this would require writing a for-loop and fidling around a lot with axis and plot labelling and the par()-function to make them fit on one page. With basic R you would have be extremely careful to make the code general and reusable for next time when you have different compounds and different cell lines.

Enter ggplot2 and the grammar of graphics. ggplot2 is a package for implementing the grammar of graphics, which allows you to write extremely succinct and natural languages like code that produces stunning visualizations. Here is some code

p = qplot(Concentration, Percent.of.control, 
               data=screening_data, 
               geom=c("point", "smooth"), colour=Response.type) +
        scale_x_log10() +
        facet_grid(Compound ~ Cell.line) +
        coord_cartesian(ylim=c(-10, 110))
print(p)

And now the plot:

ggplot2 dose response curves with facetting

If you compare the code and the plot you will realize that the code contains about the words that you would use if you were told to briefly describe the plot using English. Me like!

If you want to compare differences in potency of the compounds in different cell lines that can easily be accomplished by changing the variables:

q =  qplot(Cell.line, Percent.of.control, 
                 data=screening_data, 
                 geom=c("point"), colour=Response.type) +
        facet_grid(Compound ~ Concentration) +
        coord_cartesian(ylim=c(-10, 110))
print(q)

ggplot2 example with variables switched

In many ways it is similar to lattice, but Ggplot2 is more opinionated, e.g. it forces you to organize your data in long data.frames. Conveniently it includes several useful tools for reformating your data structures.

Read moreĀ  in the draft ggplot2-book, which freely available from the developer.

Share and Enjoy:
  • Facebook
  • Google Bookmarks
  • Digg
  • del.icio.us
  • Slashdot
  • email

Morten Geek stuff

  1. Kamy
    October 5th, 2009 at 19:29 | #1

    Do you prefer ggplot2 over lattice? I’ve been working with ggplot2 but find it still has bugs, so have been considering learning lattice.

    • October 8th, 2009 at 07:45 | #2

      I prefer ggplot2 over lattice, because I find the syntax and logic based on the grammar of graphics much easier to remember. Also I like the default theme of ggplot2 a lot more than lattice. But yes lattice has been around for much longer, is faster and probably has less bugs.

  1. No trackbacks yet.