PubMed keyword statistics
February 17th, 2009
Today we have made a handy little web-service available: given a query (PubMed keywords), visualize the number of articles in PubMed over time. This is also a good time to demonstrate how such a task can be achieved with a minimal effort using Ruby. By using the Ruby packages (gems) Gruff and Bio::PubMed we can do it in less than 20 lines of should-be-readable-code. Here we search for PubMed articles published in the years 2000-2009 and containing the terms ‘miRNA OR microRNA’:
#!/usr/bin/ruby require 'rubygems' require 'gruff' require 'bio' require 'date' picture_size = "450x450" picture_file = "papers.png" query = "miRNA OR microRNA" years = (2000 .. Date.today.year).to_a papers = years.map{|y| Bio::PubMed.esearch(query, {:mindate => y, :maxdate => y, 'rettype' => 'count'})} g = Gruff::Line.new(picture_size) g.theme_keynote g.title = "Query: #{query}" g.data("papers",papers.map{|x| x.to_i}) yearlabels = Hash.new years.each_with_index{|y,idx| yearlabels[idx]=y.to_s} g.labels = yearlabels g.hide_legend = true g.y_axis_label = "# papers" g.write(picture_file)

We could also come up with something more sophisticated, for example extracting the journal name for each PubMed entry in the search:
# ... Bio::PubMed.esearch("mirna or microRNA", {:mindate => y, :maxdate => y}).each do |pmid| article = Bio::MEDLINE.new(Bio::PubMed.efetch(pmid).first) journal = articel.journal # ...
Following the first example, we could make a graph for each of the 5 journals with most articles in the time interval:
