Archive

Archive for the ‘biogeek webservice’ Category

PubMed keyword statistics

February 17th, 2009

Today we have made a handy little web-service available: given a query (PubMed keywords), visualize the number of articles in PubMed over time. This is also a good time to demonstrate how such a task can be achieved with a minimal effort using Ruby. By using the Ruby packages (gems) Gruff and Bio::PubMed we can do it in less than 20 lines of should-be-readable-code. Here we search for PubMed articles published in the years 2000-2009 and containing the terms ‘miRNA OR microRNA’:

#!/usr/bin/ruby
 
require 'rubygems'
require 'gruff'
require 'bio'
require 'date'
 
picture_size = "450x450"
picture_file = "papers.png"
query = "miRNA OR microRNA"
years = (2000 .. Date.today.year).to_a
 
papers = years.map{|y| Bio::PubMed.esearch(query,
                                   {:mindate => y,
                                     :maxdate => y,
                                     'rettype' => 'count'})}
 
g = Gruff::Line.new(picture_size)
g.theme_keynote
g.title = "Query: #{query}"
g.data("papers",papers.map{|x| x.to_i})
yearlabels = Hash.new
years.each_with_index{|y,idx| yearlabels[idx]=y.to_s}
g.labels = yearlabels
g.hide_legend = true
g.y_axis_label = "# papers"
g.write(picture_file)

We could also come up with something more sophisticated, for example extracting the journal name for each PubMed entry in the search:

# ...
Bio::PubMed.esearch("mirna or microRNA",
                                 {:mindate => y, :maxdate => y}).each do |pmid|
  article = Bio::MEDLINE.new(Bio::PubMed.efetch(pmid).first)
  journal = articel.journal
# ...

Following the first example, we could make a graph for each of the 5 journals with most articles in the time interval:

anders Cool Tools, Geek stuff, biogeek webservice