Hi,
Thanks, John, for your notes from last week's session.
My previous post with a version of the RubyRx DSL was somehow mixed up with John's notes. (I would have said that that was not possible, but somehow it happened.)
It would probably be pretty tough to recover the old post, so I think I won't do that. Besides, RubyRx has gone through a number of significant changes over the last week, so I think it will be better to post the latest version.
Here are the things that have changed:
I added summary stats to the template, and a SummaryStats class to the program. So RubyRx can now produce and output the kinds of simple statistics that generally show up in the Clinical Study Report (the CSR, which is the report of clinical trial results that goes to the FDA after the trial is complete).
I removed the hardcoded hashes that gave me a bunch of demographics data to work with. I also removed the class that previously held those hashes. In their place, I have included a require to the ActiveRecord gem, and a subclass of ActiveRecord::Base. I also created a somewhat crude version of an SDTM demographics table in MySQL on my laptop, and a program that generates a lot of DM-style records. (By a lot in this context I mean thousands of records.)
So RubyRx is maturing. There's still a decent amount of work left to do. Here's the list of next steps:
1. Show that this will work on other kinds of data (for example, Adverse Events data).
2. Put this on Heroku (including the data) so many of us can work on the code simmultaneously.
3. Do code review. I am interested in doing code review at the Boston Ruby group's February Hackfests.
4. Performance. The code works fine for < 1000 records, but it starts to slow down when there are over 1,000, and for over 10,000 it is probably too slow. For example, I created 12,000 DM records, and it took nearly a minute for the program to execute. That's not horrible, but it's not great, either. I am interested in hearing ideas on how to make this faster (caching, etc.). We can go over this at the next session (on Tuesday the 21st) and at the February Hackfests.
This is certainly not an exhaustive list. But it's a start.
Here's the code:
require 'rubygems'
require 'active_record'
require 'erb'
t0 = Time.new
class SummaryStatistics
attr_accessor :n, :mean, :variance, :median, :standard_deviation, :minimum, :maximum, :freq
def initialize
@arr = []
end
def calc_n
@arr.select{ |e| e != nil }.size
end
def calc_mean_and_variance
n, mean, s = [0, 0, 0]
@arr.each_with_index do |x, n|
delta = (x - mean).to_f
mean += delta/(n+1)
s += delta*(x - mean)
end
return mean, s/n
end
def calc_mean
@arr.inject { |sum, e| sum += e } / @n
end
def calc_median
@arr.size%2 == 1 ? @arr[(@n / 2.0).ceil - 1] : (@arr[(@n / 2) - 1] + @arr[@n/ 2]) / 2.0
end
#def calc_variance
# @arr.size == 1 ? 0 : (@arr.inject(0) { |total, e| total += ((e - @mean) ** 2) }) / (@n - 1)
#end
def calc_standard_deviation
Math.sqrt(@variance)
end
def calc_minimum
min = @arr[0]
@arr.each { |e| min = e if e != nil and (min == nil or e < min) }
min.to_f
end
def calc_maximum
max = @arr[0]
@arr.each { |e| max = e if e != nil and (max == nil or e > max) }
max.to_f
end
def << (obj)
@arr << obj
end
def calculate_stats
t1 = Time.new
@arr.sort!
@n = calc_n
if @arr[0].kind_of?(Numeric)
@mean, @variance = calc_mean_and_variance
@median = calc_median
@standard_deviation = calc_standard_deviation
@minimum = calc_minimum
@maximum = calc_maximum
end
@freq = Hash.new(0)
@arr.each { |e| @freq[e] += 1 if e != nil}
puts 'Duration of calc_stats: ' + (Time.new - t1).to_s + ', size: ' + @n.to_s
end
end
ActiveRecord::Base.establish_connection(
:adapter => "mysql",
:host => "localhost",
:password => "xxx",
:database => "rubyrx")
class DM < ActiveRecord::Base
end
class Output
attr_accessor :sex_by_trtgrp, :race_by_trtgrp, :age_by_trtgrp, :trtgrp, :sex, :age
# Support templating of member data.
def get_binding
binding
end
def get_statistics(on_and_by_vars_str)
#puts 'in get_statistics'
#puts on_and_by_vars_str
on_and_by_vars_str.to_s =~ /^(.*)_by_(.*)/i
on_vars_str = $1
by_vars_str = $2
#puts 'on_vars_str'
#puts on_vars_str
#puts 'by_vars_str'
#puts by_vars_str
by_vars_for_hash = ""
by_vars = by_vars_str.split(/_and_/i)
#puts 'by_vars'
#p by_vars
by_vars.each { |var| by_vars_for_hash << "[r.#{var}]"} unless by_vars.empty?
#puts 'does it get here?'
#puts 'by_vars_for_hash'
#puts by_vars_for_hash
#puts by_vars.join('_')
on_vars = on_vars_str.split(/_and_/i)
#puts 'on_vars'
#p on_vars
on_vars.each do |on_var|
#puts 'in on_vars.each'
#p on_var
var_name = "#{on_var}"
var_name << "_by_#{by_vars.join('_')}" unless by_vars.empty?
#puts 'var_name'
#puts var_name
instance_eval("@#{var_name} = Hash.new")
#puts 'sex_trtgrp'
#p @sex_trtgrp
distinct_records = DM.find(:all, :select => "DISTINCT trtgrp, #{on_var}")
#puts 'distinct_records'
#p distinct_records
records = DM.find(:all)
distinct_records.each do |r|
instance_eval("@#{var_name}#{by_vars_for_hash} = SummaryStatistics.new")
#puts 'in @sdtm.each = 0 loop'
#p @sex_trtgrp
end
records.each do |r|
instance_eval("@#{var_name}#{by_vars_for_hash} << r.#{on_var}")
#puts 'in @sdtm.each += 0 loop'
#p @sex_trtgrp
end
distinct_records.each do |r|
instance_eval("@#{var_name}#{by_vars_for_hash}.calculate_stats")
#puts 'in @sdtm.each += 0 loop'
#p @sex_trtgrp
end
end
return self
end
def method_missing(sym, *args)
sym.to_s =~ /^(.*)_on_(.*)/i
method_name = $1
on_and_by_vars_str = $2
#puts 'in method missing'
#puts method_name
#puts on_and_by_vars_str
instance_eval("#{method_name}('#{on_and_by_vars_str}')")
end
end
begin
o = Output.new.get_statistics_on_sex_and_race_and_age_by_trtgrp.get_statistics_on_trtgrp_by_.get_statistics_on_sex_by_.get_statistics_on_race_by_.get_statistics_on_age_by_
#o = Output.new.get_statistics_on_sex_by_trtgrp
# File.open('saved_object.txt', 'w+') do |f|
# Marshal.dump(o, f)
#end
#puts 'Elapsed time: ' + (Time.new - t0).to_s
#p o.sex_by_trtgrp
#p o.race_by_trtgrp
#p o.age_by_trtgrp
#p o.sex
#p o.trtgrp
#p o.race
#p o.age
#x = File.open('saved_object.txt', 'r') do |f|
# Marshal.load(f)
#end
File.open('T14.2-Demo-Template.erb', 'r') do |t|
File.open('T14.2-Demo.rhtml', 'w') do |f|
f.puts ERB.new(t.readlines.to_s).result(o.get_binding)
end
end
rescue => e
puts e
ensure
puts 'Elapsed time: ' + (Time.new - t0).to_s
end
And here's the current DM template:
ACME Pharmaceuticals
Study XXXXXXX
Demographics Table
Placebo Drug Total
N=<%=@trtgrp.freq['Placebo']%> N=<%=@trtgrp.freq['Drug']%> N=<%=@trtgrp.n%>
Sex
M <%=@sex_by_trtgrp['Placebo'].freq['M']%> <%=@sex_by_trtgrp['Drug'].freq['M']%> <%=@sex.freq['M']%>
F <%=@sex_by_trtgrp['Placebo'].freq['F']%> <%=@sex_by_trtgrp['Drug'].freq['F']%> <%=@sex.freq['F']%>
Race
White <%=@race_by_trtgrp['Placebo'].freq['WHITE']%> <%=@race_by_trtgrp['Drug'].freq['WHITE']%> <%=@race.freq['WHITE']%>
Black <%=@race_by_trtgrp['Placebo'].freq['BLACK']%> <%=@race_by_trtgrp['Drug'].freq['BLACK']%> <%=@race.freq['BLACK']%>
Asian <%=@race_by_trtgrp['Placebo'].freq['ASIAN']%> <%=@race_by_trtgrp['Drug'].freq['ASIAN']%> <%=@race.freq['ASIAN']%>
Other <%=@race_by_trtgrp['Placebo'].freq['Other']%> <%=@race_by_trtgrp['Drug'].freq['Other']%> <%=@race.freq['Other']%>
Age
n <%=@age_by_trtgrp['Placebo'].n%> <%=@age_by_trtgrp['Drug'].n%> <%=@age.n%>
mean <%=sprintf("%0.1f", @age_by_trtgrp['Placebo'].mean)%> <%=sprintf("%0.1f", @age_by_trtgrp['Drug'].mean)%> <%=sprintf("%0.1f", @age.mean)%>
std <%=sprintf("%0.2f", @age_by_trtgrp['Placebo'].standard_deviation)%> <%=sprintf("%0.2f", @age_by_trtgrp['Drug'].standard_deviation)%> <%=sprintf("%0.2f", @age.standard_deviation)%>
median <%=@age_by_trtgrp['Placebo'].median%> <%=@age_by_trtgrp['Drug'].median%> <%=@age.median%>
min <%=@age_by_trtgrp['Placebo'].minimum%> <%=@age_by_trtgrp['Drug'].minimum%> <%=@age.minimum%>
max <%=@age_by_trtgrp['Placebo'].maximum%> <%=@age_by_trtgrp['Drug'].maximum%> <%=@age.maximum%>
See you on Tuesday.
Thanks,
Glenn
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment