Hi All,
Thanks, John, for your notes from the Jan. 27th class.
I wanted to let you all know that there is a new version of RubyRx in HerokuGarden. The programs are called rubyrx.rb and summary_statistics.rb, and they are in app/models.
This latest version is 4 times faster than the previous version. So RubyRx has gone through 2 major iterations (and many minor ones) and for both of the major ones we've seen 4 x performance improvements. The first version took nearly 60 seconds to take 12,000 demographics records and produce a simple summary table. The second version did the same thing in about 15 seconds. The performamce of the latest version is under 4 seconds. This is probably about where we need it to be, at least for now. It's true that we will have this tool work on more complicated, bigger tables, but it seems to me that the performance will still be good enough. (Of course, we'll need to test that.) In any case, clinical studies are usually smaller than 12,000 patients, often a great deal smaller, so I think we are at a good point performance-wise.
One way I used to enhance the performance was to use lazy evaluation in the statistical methods. Here's an example of lazy evaluation:
def n
@n ||= @arr.size
end
The n method checks to see if the @n instance variable is anything other than nil or false. If it is not nil or false, it returns that value. If it is nil or false, it executes the code to the right of the ||=, places the result into the @n instance variable and returns that value. This means that it is no longer necessary to have an iteration step to calculate the stats (like I did in the previous versions). It also means that each statistic gets calculated only when it is used, and only once. This saved a great deal of calculation time. Thanks, David, for that tip.
One feature that I added is to create an object in the Results class that is equal to the ActiveRecord class that you want to use. So DM is no longer hardcoded into the class; you can pass the name of the table into the Results.new method when creating new objects out of the Results class. For example:
dm = Results.new('dms')
This will create a Results object called dm with a class inside it equal to an ActiveRecord object that points to the demographics table.
Here's another example:
ae = Results.new('aes')
This creates a Results object with an ActiveRecord class that points to the AE table in the database.
By the way, the Results class wass formerly the Output class. I renamed it because I think it made more sense to call it Results because its objects will hold instance variables with the results of the statistical calculations. It did not make sense to call it Output, because one of the things I really insisted on is to keep all of the output functionality in the erb templates. So the results are stored in the instance variables of the Results class -- in the nested hashes that you've seen in previous classes. The erb templates use those instance variables in whatever ways they see fit (ways that the Results class needs to know nothing whatsoever about).
Please let me know if you have any questions about the new code. It's a lot cleaner to read, because a lot of the fluff is gone. Also, I think that RubyRx has reached the point where the core part is not going to change a lot from week to week. The core will be pretty stable. That is good news, beacuse not only does it make it nmuch easier to see what's going on (and so works a lot better as a teaching example), but it also means that RubyRx is ready to move to the next level.
More on moving to the next level in a future post.
Thanks,
Glenn
Thursday, January 29, 2009
Tuesday, January 27, 2009
27-JAN-2009 Central Square Library
Today
RubyRx from 10000 feet
Where to meet?
Summary Statistics Clas
Active Record Migrations
Next Week is Hackfest
In the Rails world the programmer and the database person are the same.
Future
More work on CPE
Logging
I/O
Test:Unit
RSPEC
Hackfest - Near Park Street-Just bring your computer and go.
10,000 foot view
Part 3:
We've seen this part of the process:
File.open('T14.2-Demo-Template.erb','r') do t
File.open('T14.2-Demo.rhtml','w') do f
f.puts ERB.new(t.readlines.to_s).result (output.new.get_statistics_on_sex_race_age_by_arm.getbinding)
end
end
Part 1:
Get the data for DM for the study.
Calculate statistics on sex race age by arm
Put the output into table 14.2-Demog.erb
Part 2:
Template Generator
So Glenn thought that a natural language process would parse keywords and generate the Ruby code. He got this idea from Cucumber.
YAML_builder was mentioned as a way to create templates.
Glenn got things to run 4 times faster.
Summary Statistics
class SummaryStatistics
def initialize
@arr=[]
@freq=Hash.new(0)
end
def <<
@arr <<
@freq[obj] += 1
end
M
M
F
{'M'=>2, 'F'=>1}
def calc_n
end
def calc_median
end
def calc_mean
end
def calculate stats
@arr.srt!
@arr_no_nils=@arr.delete_if{e e=nil}
end
@n=@arr.size
@mean=calc_mean
@variance=calc_variance
@min=@arr.min
@max=@arr.max
Glenn found a method that calculates mean and variance at the same time on the internet.
We talked about lazy initialization.
Migrations
'up' you are adding something but you could be deleting.
'up' means next version
'down' mean revert
Rails likes tables that end in an s
class CreateDms
def self.up
create table_table :dms do t
t.string :domain
t.string :usubjid
t.text :comment
t.decimal :age
end
def self.down
drop_table :dms
end
end
rake db:migrate
RubyRx from 10000 feet
Where to meet?
Summary Statistics Clas
Active Record Migrations
Next Week is Hackfest
In the Rails world the programmer and the database person are the same.
Future
More work on CPE
Logging
I/O
Test:Unit
RSPEC
Hackfest - Near Park Street-Just bring your computer and go.
10,000 foot view
Part 3:
We've seen this part of the process:
File.open('T14.2-Demo-Template.erb','r') do t
File.open('T14.2-Demo.rhtml','w') do f
f.puts ERB.new(t.readlines.to_s).result (output.new.get_statistics_on_sex_race_age_by_arm.getbinding)
end
end
Part 1:
Get the data for DM for the study.
Calculate statistics on sex race age by arm
Put the output into table 14.2-Demog.erb
Part 2:
Template Generator
So Glenn thought that a natural language process would parse keywords and generate the Ruby code. He got this idea from Cucumber.
YAML_builder was mentioned as a way to create templates.
Glenn got things to run 4 times faster.
Summary Statistics
class SummaryStatistics
def initialize
@arr=[]
@freq=Hash.new(0)
end
def <<
@arr <<
@freq[obj] += 1
end
M
M
F
{'M'=>2, 'F'=>1}
def calc_n
end
def calc_median
end
def calc_mean
end
def calculate stats
@arr.srt!
@arr_no_nils=@arr.delete_if{e e=nil}
end
@n=@arr.size
@mean=calc_mean
@variance=calc_variance
@min=@arr.min
@max=@arr.max
Glenn found a method that calculates mean and variance at the same time on the internet.
We talked about lazy initialization.
Migrations
'up' you are adding something but you could be deleting.
'up' means next version
'down' mean revert
Rails likes tables that end in an s
class CreateDms
def self.up
create table_table :dms do t
t.string :domain
t.string :usubjid
t.text :comment
t.decimal :age
end
def self.down
drop_table :dms
end
end
rake db:migrate
Monday, January 26, 2009
The next session is on January 27th
Hi,
We are planning to meet at 4:30 on the 27th at the Central Square Library. This will be the last session at the library for the foreseeable future, as the library needs our usual meeting room for temporary offices while the main Cambridge Library is under renovations. We are unsure how long this will last. I will keep you posted on the updates on this status.
The plan in the meantime is to meet via Skype. I cannot have a session on the night of February 3rd as I am going to be attending the Boston Ruby Group's monthly Hackfest -- I'll have details on the Hackfest tomorrow night -- so the next session after the one on the 27th will be on February 10th. We will have to meet later in the evening.
I posted the code on the Heroku garden cpe application. All of the regular attendess of the classes should have received an invitation to that application. Please let me know if you did not. The 2 programs, summary_statistics.rb and rubyrx2.rb, are in app/models. The code has gone through still more revisions, and is now 4 times faster (15 seconds instead of almost 60 to take 12,000 demographics records and produce a summary table). There are database tables as well, and some dummy records in the demographics tables (dms).
By the way, if I haven't mentioned before, CPE stands for Clinical Programming Engine.
The focus of the Jan. 27th meeting will be more of the RubyRx framework, including the SummaryStatistics class, and more on ActiveRecord.
See you tomorrow night.
Glenn
We are planning to meet at 4:30 on the 27th at the Central Square Library. This will be the last session at the library for the foreseeable future, as the library needs our usual meeting room for temporary offices while the main Cambridge Library is under renovations. We are unsure how long this will last. I will keep you posted on the updates on this status.
The plan in the meantime is to meet via Skype. I cannot have a session on the night of February 3rd as I am going to be attending the Boston Ruby Group's monthly Hackfest -- I'll have details on the Hackfest tomorrow night -- so the next session after the one on the 27th will be on February 10th. We will have to meet later in the evening.
I posted the code on the Heroku garden cpe application. All of the regular attendess of the classes should have received an invitation to that application. Please let me know if you did not. The 2 programs, summary_statistics.rb and rubyrx2.rb, are in app/models. The code has gone through still more revisions, and is now 4 times faster (15 seconds instead of almost 60 to take 12,000 demographics records and produce a summary table). There are database tables as well, and some dummy records in the demographics tables (dms).
By the way, if I haven't mentioned before, CPE stands for Clinical Programming Engine.
The focus of the Jan. 27th meeting will be more of the RubyRx framework, including the SummaryStatistics class, and more on ActiveRecord.
See you tomorrow night.
Glenn
Tuesday, January 20, 2009
Central Square Library -20JAN2009
Agenda
Active Record and Method Missing
Venue
Today's class is the second to last class here at the CS Library.
We are still getting booted:
Alternatives are:
MIT?
Toscaninis
Somerville, Cambridge, Brookline
Conference Call
Does everyone have skype, a camera, would it work?
Glenn's using IRB*
* NOTE: There was some talk that Glenn was using IRB for his editting. he isn't.
He is using ERB though.
require 'rubygems'
require 'active_record'
It is possible to build a connection and make updates to different database platforms easily.
#Active record translates the ruby to sql.
ActiveRecord::Base.establish_connection(
:apdapter=>"mysql",
mysql is the database Glenn's been using
You can pass sql in Ruby, but you can't use active record.
If there's an issue with speed and peerformance.
:host=>"localhost",
:password=>"xxx",
:database=>"rubyrx")
This previous stuff is a hash btw.
class DM < distinc_records="DM.find(:all,:select="">"distinct trtgrp ,#{on_var}")
Give me all the records from this object that fit fit this criteria.
Create an array that is equal to
M P
M D
M P
M D
F P
F D
on something like this:
['M','F','D','P')
To get an array of all the activerecord objects which correspond to the above.
To get all records drop the distinct.
records=DM.find(:all)
Each variable is a column in table and each object in the array is a row.
We can iterate over distinct.record.
distinct_records.each do r
sex_by_trtgrt=SummaryStatistics.new
end
Change from the way Glenn did things last week.
Abstract out the statistical part of the code.
Why did Gklenn more from args to onvar
distinct_records.each do r
instance_eval("#on_var)_by_trtgrp[r.trtgrp]=SummaryStatistics.new")
end
Creates a key in the hash and a value.
{'Placebo=> <>, 'Drg' => <>}
Sets up the key pair relationship.
o=Object.new
puts o
would return something like <71343427>
@sex_by_trtgrp
Summary Statistics
class SummaryStatistics
def initialize
@arr=[]
end
def cal_n
...
end
mean, median, max, min
def <<(obj) @arr<["M"],'DRUG'=>['M','M','F']
There are no calculations, only pushing the values into the hash.
This part does one thing and this part dose another. If you change one part you don't have to change the other.
r.trtgrp is the value of trtgrp for that row.
Next Week
Natural Language Parsing
Future
Logging
IO
Test Unit
Spec
Glenn recommends going through the code he just post to really understand this.
He's thinking of posting to Heroku.
Performance is slow.
1000 is a about a second.
Over 10,000 might be a problem about 60 second
Its not horrible but it is not great.
Next week we'll get more SummaryStatistics
Active Record and Method Missing
Venue
Today's class is the second to last class here at the CS Library.
We are still getting booted:
Alternatives are:
MIT?
Toscaninis
Somerville, Cambridge, Brookline
Conference Call
Does everyone have skype, a camera, would it work?
Glenn's using IRB*
* NOTE: There was some talk that Glenn was using IRB for his editting. he isn't.
He is using ERB though.
require 'rubygems'
require 'active_record'
It is possible to build a connection and make updates to different database platforms easily.
#Active record translates the ruby to sql.
ActiveRecord::Base.establish_connection(
:apdapter=>"mysql",
mysql is the database Glenn's been using
You can pass sql in Ruby, but you can't use active record.
If there's an issue with speed and peerformance.
:host=>"localhost",
:password=>"xxx",
:database=>"rubyrx")
This previous stuff is a hash btw.
class DM < distinc_records="DM.find(:all,:select="">"distinct trtgrp ,#{on_var}")
Give me all the records from this object that fit fit this criteria.
Create an array that is equal to
M P
M D
M P
M D
F P
F D
on something like this:
['M','F','D','P')
To get an array of all the activerecord objects which correspond to the above.
To get all records drop the distinct.
records=DM.find(:all)
Each variable is a column in table and each object in the array is a row.
We can iterate over distinct.record.
distinct_records.each do r
sex_by_trtgrt=SummaryStatistics.new
end
Change from the way Glenn did things last week.
Abstract out the statistical part of the code.
Why did Gklenn more from args to onvar
distinct_records.each do r
instance_eval("#on_var)_by_trtgrp[r.trtgrp]=SummaryStatistics.new")
end
Creates a key in the hash and a value.
{'Placebo=> <>, 'Drg' => <>}
Sets up the key pair relationship.
o=Object.new
puts o
would return something like <71343427>
@sex_by_trtgrp
Summary Statistics
class SummaryStatistics
def initialize
@arr=[]
end
def cal_n
...
end
mean, median, max, min
def <<(obj) @arr<
There are no calculations, only pushing the values into the hash.
This part does one thing and this part dose another. If you change one part you don't have to change the other.
r.trtgrp is the value of trtgrp for that row.
Next Week
Natural Language Parsing
Future
Logging
IO
Test Unit
Spec
Glenn recommends going through the code he just post to really understand this.
He's thinking of posting to Heroku.
Performance is slow.
1000 is a about a second.
Over 10,000 might be a problem about 60 second
Its not horrible but it is not great.
Next week we'll get more SummaryStatistics
We are on for the Jan. 20th session tonight
Hi,
We are are for the Ruby class tonight at 4:30 at the Central Square Library in the usual room.
We are going to spend the hour and a half going over the RubyRx DSL.
The specific topics for tonight are ActiveRecord and methodmissing.
Please bring a copy of the code if possible. I know that the lines spacing and indentation is removed by blogger, so the formatting doesn't look great, but I think it would be worth seeing the code in its entirety.
Thanks,
Glenn
We are are for the Ruby class tonight at 4:30 at the Central Square Library in the usual room.
We are going to spend the hour and a half going over the RubyRx DSL.
The specific topics for tonight are ActiveRecord and methodmissing.
Please bring a copy of the code if possible. I know that the lines spacing and indentation is removed by blogger, so the formatting doesn't look great, but I think it would be worth seeing the code in its entirety.
Thanks,
Glenn
Monday, January 19, 2009
Current version of RubyRx
Hi,
Thanks, John, for your notes from last week's session.
My previous post with a version of the RubyRx DSL was somehow mixed up with John's notes. (I would have said that that was not possible, but somehow it happened.)
It would probably be pretty tough to recover the old post, so I think I won't do that. Besides, RubyRx has gone through a number of significant changes over the last week, so I think it will be better to post the latest version.
Here are the things that have changed:
I added summary stats to the template, and a SummaryStats class to the program. So RubyRx can now produce and output the kinds of simple statistics that generally show up in the Clinical Study Report (the CSR, which is the report of clinical trial results that goes to the FDA after the trial is complete).
I removed the hardcoded hashes that gave me a bunch of demographics data to work with. I also removed the class that previously held those hashes. In their place, I have included a require to the ActiveRecord gem, and a subclass of ActiveRecord::Base. I also created a somewhat crude version of an SDTM demographics table in MySQL on my laptop, and a program that generates a lot of DM-style records. (By a lot in this context I mean thousands of records.)
So RubyRx is maturing. There's still a decent amount of work left to do. Here's the list of next steps:
1. Show that this will work on other kinds of data (for example, Adverse Events data).
2. Put this on Heroku (including the data) so many of us can work on the code simmultaneously.
3. Do code review. I am interested in doing code review at the Boston Ruby group's February Hackfests.
4. Performance. The code works fine for < 1000 records, but it starts to slow down when there are over 1,000, and for over 10,000 it is probably too slow. For example, I created 12,000 DM records, and it took nearly a minute for the program to execute. That's not horrible, but it's not great, either. I am interested in hearing ideas on how to make this faster (caching, etc.). We can go over this at the next session (on Tuesday the 21st) and at the February Hackfests.
This is certainly not an exhaustive list. But it's a start.
Here's the code:
require 'rubygems'
require 'active_record'
require 'erb'
t0 = Time.new
class SummaryStatistics
attr_accessor :n, :mean, :variance, :median, :standard_deviation, :minimum, :maximum, :freq
def initialize
@arr = []
end
def calc_n
@arr.select{ |e| e != nil }.size
end
def calc_mean_and_variance
n, mean, s = [0, 0, 0]
@arr.each_with_index do |x, n|
delta = (x - mean).to_f
mean += delta/(n+1)
s += delta*(x - mean)
end
return mean, s/n
end
def calc_mean
@arr.inject { |sum, e| sum += e } / @n
end
def calc_median
@arr.size%2 == 1 ? @arr[(@n / 2.0).ceil - 1] : (@arr[(@n / 2) - 1] + @arr[@n/ 2]) / 2.0
end
#def calc_variance
# @arr.size == 1 ? 0 : (@arr.inject(0) { |total, e| total += ((e - @mean) ** 2) }) / (@n - 1)
#end
def calc_standard_deviation
Math.sqrt(@variance)
end
def calc_minimum
min = @arr[0]
@arr.each { |e| min = e if e != nil and (min == nil or e < min) }
min.to_f
end
def calc_maximum
max = @arr[0]
@arr.each { |e| max = e if e != nil and (max == nil or e > max) }
max.to_f
end
def << (obj)
@arr << obj
end
def calculate_stats
t1 = Time.new
@arr.sort!
@n = calc_n
if @arr[0].kind_of?(Numeric)
@mean, @variance = calc_mean_and_variance
@median = calc_median
@standard_deviation = calc_standard_deviation
@minimum = calc_minimum
@maximum = calc_maximum
end
@freq = Hash.new(0)
@arr.each { |e| @freq[e] += 1 if e != nil}
puts 'Duration of calc_stats: ' + (Time.new - t1).to_s + ', size: ' + @n.to_s
end
end
ActiveRecord::Base.establish_connection(
:adapter => "mysql",
:host => "localhost",
:password => "xxx",
:database => "rubyrx")
class DM < ActiveRecord::Base
end
class Output
attr_accessor :sex_by_trtgrp, :race_by_trtgrp, :age_by_trtgrp, :trtgrp, :sex, :age
# Support templating of member data.
def get_binding
binding
end
def get_statistics(on_and_by_vars_str)
#puts 'in get_statistics'
#puts on_and_by_vars_str
on_and_by_vars_str.to_s =~ /^(.*)_by_(.*)/i
on_vars_str = $1
by_vars_str = $2
#puts 'on_vars_str'
#puts on_vars_str
#puts 'by_vars_str'
#puts by_vars_str
by_vars_for_hash = ""
by_vars = by_vars_str.split(/_and_/i)
#puts 'by_vars'
#p by_vars
by_vars.each { |var| by_vars_for_hash << "[r.#{var}]"} unless by_vars.empty?
#puts 'does it get here?'
#puts 'by_vars_for_hash'
#puts by_vars_for_hash
#puts by_vars.join('_')
on_vars = on_vars_str.split(/_and_/i)
#puts 'on_vars'
#p on_vars
on_vars.each do |on_var|
#puts 'in on_vars.each'
#p on_var
var_name = "#{on_var}"
var_name << "_by_#{by_vars.join('_')}" unless by_vars.empty?
#puts 'var_name'
#puts var_name
instance_eval("@#{var_name} = Hash.new")
#puts 'sex_trtgrp'
#p @sex_trtgrp
distinct_records = DM.find(:all, :select => "DISTINCT trtgrp, #{on_var}")
#puts 'distinct_records'
#p distinct_records
records = DM.find(:all)
distinct_records.each do |r|
instance_eval("@#{var_name}#{by_vars_for_hash} = SummaryStatistics.new")
#puts 'in @sdtm.each = 0 loop'
#p @sex_trtgrp
end
records.each do |r|
instance_eval("@#{var_name}#{by_vars_for_hash} << r.#{on_var}")
#puts 'in @sdtm.each += 0 loop'
#p @sex_trtgrp
end
distinct_records.each do |r|
instance_eval("@#{var_name}#{by_vars_for_hash}.calculate_stats")
#puts 'in @sdtm.each += 0 loop'
#p @sex_trtgrp
end
end
return self
end
def method_missing(sym, *args)
sym.to_s =~ /^(.*)_on_(.*)/i
method_name = $1
on_and_by_vars_str = $2
#puts 'in method missing'
#puts method_name
#puts on_and_by_vars_str
instance_eval("#{method_name}('#{on_and_by_vars_str}')")
end
end
begin
o = Output.new.get_statistics_on_sex_and_race_and_age_by_trtgrp.get_statistics_on_trtgrp_by_.get_statistics_on_sex_by_.get_statistics_on_race_by_.get_statistics_on_age_by_
#o = Output.new.get_statistics_on_sex_by_trtgrp
# File.open('saved_object.txt', 'w+') do |f|
# Marshal.dump(o, f)
#end
#puts 'Elapsed time: ' + (Time.new - t0).to_s
#p o.sex_by_trtgrp
#p o.race_by_trtgrp
#p o.age_by_trtgrp
#p o.sex
#p o.trtgrp
#p o.race
#p o.age
#x = File.open('saved_object.txt', 'r') do |f|
# Marshal.load(f)
#end
File.open('T14.2-Demo-Template.erb', 'r') do |t|
File.open('T14.2-Demo.rhtml', 'w') do |f|
f.puts ERB.new(t.readlines.to_s).result(o.get_binding)
end
end
rescue => e
puts e
ensure
puts 'Elapsed time: ' + (Time.new - t0).to_s
end
And here's the current DM template:
ACME Pharmaceuticals
Study XXXXXXX
Demographics Table
Placebo Drug Total
N=<%=@trtgrp.freq['Placebo']%> N=<%=@trtgrp.freq['Drug']%> N=<%=@trtgrp.n%>
Sex
M <%=@sex_by_trtgrp['Placebo'].freq['M']%> <%=@sex_by_trtgrp['Drug'].freq['M']%> <%=@sex.freq['M']%>
F <%=@sex_by_trtgrp['Placebo'].freq['F']%> <%=@sex_by_trtgrp['Drug'].freq['F']%> <%=@sex.freq['F']%>
Race
White <%=@race_by_trtgrp['Placebo'].freq['WHITE']%> <%=@race_by_trtgrp['Drug'].freq['WHITE']%> <%=@race.freq['WHITE']%>
Black <%=@race_by_trtgrp['Placebo'].freq['BLACK']%> <%=@race_by_trtgrp['Drug'].freq['BLACK']%> <%=@race.freq['BLACK']%>
Asian <%=@race_by_trtgrp['Placebo'].freq['ASIAN']%> <%=@race_by_trtgrp['Drug'].freq['ASIAN']%> <%=@race.freq['ASIAN']%>
Other <%=@race_by_trtgrp['Placebo'].freq['Other']%> <%=@race_by_trtgrp['Drug'].freq['Other']%> <%=@race.freq['Other']%>
Age
n <%=@age_by_trtgrp['Placebo'].n%> <%=@age_by_trtgrp['Drug'].n%> <%=@age.n%>
mean <%=sprintf("%0.1f", @age_by_trtgrp['Placebo'].mean)%> <%=sprintf("%0.1f", @age_by_trtgrp['Drug'].mean)%> <%=sprintf("%0.1f", @age.mean)%>
std <%=sprintf("%0.2f", @age_by_trtgrp['Placebo'].standard_deviation)%> <%=sprintf("%0.2f", @age_by_trtgrp['Drug'].standard_deviation)%> <%=sprintf("%0.2f", @age.standard_deviation)%>
median <%=@age_by_trtgrp['Placebo'].median%> <%=@age_by_trtgrp['Drug'].median%> <%=@age.median%>
min <%=@age_by_trtgrp['Placebo'].minimum%> <%=@age_by_trtgrp['Drug'].minimum%> <%=@age.minimum%>
max <%=@age_by_trtgrp['Placebo'].maximum%> <%=@age_by_trtgrp['Drug'].maximum%> <%=@age.maximum%>
See you on Tuesday.
Thanks,
Glenn
Thanks, John, for your notes from last week's session.
My previous post with a version of the RubyRx DSL was somehow mixed up with John's notes. (I would have said that that was not possible, but somehow it happened.)
It would probably be pretty tough to recover the old post, so I think I won't do that. Besides, RubyRx has gone through a number of significant changes over the last week, so I think it will be better to post the latest version.
Here are the things that have changed:
I added summary stats to the template, and a SummaryStats class to the program. So RubyRx can now produce and output the kinds of simple statistics that generally show up in the Clinical Study Report (the CSR, which is the report of clinical trial results that goes to the FDA after the trial is complete).
I removed the hardcoded hashes that gave me a bunch of demographics data to work with. I also removed the class that previously held those hashes. In their place, I have included a require to the ActiveRecord gem, and a subclass of ActiveRecord::Base. I also created a somewhat crude version of an SDTM demographics table in MySQL on my laptop, and a program that generates a lot of DM-style records. (By a lot in this context I mean thousands of records.)
So RubyRx is maturing. There's still a decent amount of work left to do. Here's the list of next steps:
1. Show that this will work on other kinds of data (for example, Adverse Events data).
2. Put this on Heroku (including the data) so many of us can work on the code simmultaneously.
3. Do code review. I am interested in doing code review at the Boston Ruby group's February Hackfests.
4. Performance. The code works fine for < 1000 records, but it starts to slow down when there are over 1,000, and for over 10,000 it is probably too slow. For example, I created 12,000 DM records, and it took nearly a minute for the program to execute. That's not horrible, but it's not great, either. I am interested in hearing ideas on how to make this faster (caching, etc.). We can go over this at the next session (on Tuesday the 21st) and at the February Hackfests.
This is certainly not an exhaustive list. But it's a start.
Here's the code:
require 'rubygems'
require 'active_record'
require 'erb'
t0 = Time.new
class SummaryStatistics
attr_accessor :n, :mean, :variance, :median, :standard_deviation, :minimum, :maximum, :freq
def initialize
@arr = []
end
def calc_n
@arr.select{ |e| e != nil }.size
end
def calc_mean_and_variance
n, mean, s = [0, 0, 0]
@arr.each_with_index do |x, n|
delta = (x - mean).to_f
mean += delta/(n+1)
s += delta*(x - mean)
end
return mean, s/n
end
def calc_mean
@arr.inject { |sum, e| sum += e } / @n
end
def calc_median
@arr.size%2 == 1 ? @arr[(@n / 2.0).ceil - 1] : (@arr[(@n / 2) - 1] + @arr[@n/ 2]) / 2.0
end
#def calc_variance
# @arr.size == 1 ? 0 : (@arr.inject(0) { |total, e| total += ((e - @mean) ** 2) }) / (@n - 1)
#end
def calc_standard_deviation
Math.sqrt(@variance)
end
def calc_minimum
min = @arr[0]
@arr.each { |e| min = e if e != nil and (min == nil or e < min) }
min.to_f
end
def calc_maximum
max = @arr[0]
@arr.each { |e| max = e if e != nil and (max == nil or e > max) }
max.to_f
end
def << (obj)
@arr << obj
end
def calculate_stats
t1 = Time.new
@arr.sort!
@n = calc_n
if @arr[0].kind_of?(Numeric)
@mean, @variance = calc_mean_and_variance
@median = calc_median
@standard_deviation = calc_standard_deviation
@minimum = calc_minimum
@maximum = calc_maximum
end
@freq = Hash.new(0)
@arr.each { |e| @freq[e] += 1 if e != nil}
puts 'Duration of calc_stats: ' + (Time.new - t1).to_s + ', size: ' + @n.to_s
end
end
ActiveRecord::Base.establish_connection(
:adapter => "mysql",
:host => "localhost",
:password => "xxx",
:database => "rubyrx")
class DM < ActiveRecord::Base
end
class Output
attr_accessor :sex_by_trtgrp, :race_by_trtgrp, :age_by_trtgrp, :trtgrp, :sex, :age
# Support templating of member data.
def get_binding
binding
end
def get_statistics(on_and_by_vars_str)
#puts 'in get_statistics'
#puts on_and_by_vars_str
on_and_by_vars_str.to_s =~ /^(.*)_by_(.*)/i
on_vars_str = $1
by_vars_str = $2
#puts 'on_vars_str'
#puts on_vars_str
#puts 'by_vars_str'
#puts by_vars_str
by_vars_for_hash = ""
by_vars = by_vars_str.split(/_and_/i)
#puts 'by_vars'
#p by_vars
by_vars.each { |var| by_vars_for_hash << "[r.#{var}]"} unless by_vars.empty?
#puts 'does it get here?'
#puts 'by_vars_for_hash'
#puts by_vars_for_hash
#puts by_vars.join('_')
on_vars = on_vars_str.split(/_and_/i)
#puts 'on_vars'
#p on_vars
on_vars.each do |on_var|
#puts 'in on_vars.each'
#p on_var
var_name = "#{on_var}"
var_name << "_by_#{by_vars.join('_')}" unless by_vars.empty?
#puts 'var_name'
#puts var_name
instance_eval("@#{var_name} = Hash.new")
#puts 'sex_trtgrp'
#p @sex_trtgrp
distinct_records = DM.find(:all, :select => "DISTINCT trtgrp, #{on_var}")
#puts 'distinct_records'
#p distinct_records
records = DM.find(:all)
distinct_records.each do |r|
instance_eval("@#{var_name}#{by_vars_for_hash} = SummaryStatistics.new")
#puts 'in @sdtm.each = 0 loop'
#p @sex_trtgrp
end
records.each do |r|
instance_eval("@#{var_name}#{by_vars_for_hash} << r.#{on_var}")
#puts 'in @sdtm.each += 0 loop'
#p @sex_trtgrp
end
distinct_records.each do |r|
instance_eval("@#{var_name}#{by_vars_for_hash}.calculate_stats")
#puts 'in @sdtm.each += 0 loop'
#p @sex_trtgrp
end
end
return self
end
def method_missing(sym, *args)
sym.to_s =~ /^(.*)_on_(.*)/i
method_name = $1
on_and_by_vars_str = $2
#puts 'in method missing'
#puts method_name
#puts on_and_by_vars_str
instance_eval("#{method_name}('#{on_and_by_vars_str}')")
end
end
begin
o = Output.new.get_statistics_on_sex_and_race_and_age_by_trtgrp.get_statistics_on_trtgrp_by_.get_statistics_on_sex_by_.get_statistics_on_race_by_.get_statistics_on_age_by_
#o = Output.new.get_statistics_on_sex_by_trtgrp
# File.open('saved_object.txt', 'w+') do |f|
# Marshal.dump(o, f)
#end
#puts 'Elapsed time: ' + (Time.new - t0).to_s
#p o.sex_by_trtgrp
#p o.race_by_trtgrp
#p o.age_by_trtgrp
#p o.sex
#p o.trtgrp
#p o.race
#p o.age
#x = File.open('saved_object.txt', 'r') do |f|
# Marshal.load(f)
#end
File.open('T14.2-Demo-Template.erb', 'r') do |t|
File.open('T14.2-Demo.rhtml', 'w') do |f|
f.puts ERB.new(t.readlines.to_s).result(o.get_binding)
end
end
rescue => e
puts e
ensure
puts 'Elapsed time: ' + (Time.new - t0).to_s
end
And here's the current DM template:
ACME Pharmaceuticals
Study XXXXXXX
Demographics Table
Placebo Drug Total
N=<%=@trtgrp.freq['Placebo']%> N=<%=@trtgrp.freq['Drug']%> N=<%=@trtgrp.n%>
Sex
M <%=@sex_by_trtgrp['Placebo'].freq['M']%> <%=@sex_by_trtgrp['Drug'].freq['M']%> <%=@sex.freq['M']%>
F <%=@sex_by_trtgrp['Placebo'].freq['F']%> <%=@sex_by_trtgrp['Drug'].freq['F']%> <%=@sex.freq['F']%>
Race
White <%=@race_by_trtgrp['Placebo'].freq['WHITE']%> <%=@race_by_trtgrp['Drug'].freq['WHITE']%> <%=@race.freq['WHITE']%>
Black <%=@race_by_trtgrp['Placebo'].freq['BLACK']%> <%=@race_by_trtgrp['Drug'].freq['BLACK']%> <%=@race.freq['BLACK']%>
Asian <%=@race_by_trtgrp['Placebo'].freq['ASIAN']%> <%=@race_by_trtgrp['Drug'].freq['ASIAN']%> <%=@race.freq['ASIAN']%>
Other <%=@race_by_trtgrp['Placebo'].freq['Other']%> <%=@race_by_trtgrp['Drug'].freq['Other']%> <%=@race.freq['Other']%>
Age
n <%=@age_by_trtgrp['Placebo'].n%> <%=@age_by_trtgrp['Drug'].n%> <%=@age.n%>
mean <%=sprintf("%0.1f", @age_by_trtgrp['Placebo'].mean)%> <%=sprintf("%0.1f", @age_by_trtgrp['Drug'].mean)%> <%=sprintf("%0.1f", @age.mean)%>
std <%=sprintf("%0.2f", @age_by_trtgrp['Placebo'].standard_deviation)%> <%=sprintf("%0.2f", @age_by_trtgrp['Drug'].standard_deviation)%> <%=sprintf("%0.2f", @age.standard_deviation)%>
median <%=@age_by_trtgrp['Placebo'].median%> <%=@age_by_trtgrp['Drug'].median%> <%=@age.median%>
min <%=@age_by_trtgrp['Placebo'].minimum%> <%=@age_by_trtgrp['Drug'].minimum%> <%=@age.minimum%>
max <%=@age_by_trtgrp['Placebo'].maximum%> <%=@age_by_trtgrp['Drug'].maximum%> <%=@age.maximum%>
See you on Tuesday.
Thanks,
Glenn
Tuesday, January 13, 2009
13Jan2009
Agenda
Welcome Chris and David
erb
RubyRX
Logging
Next Week
More RubyRx
Week after
IO
Sample db table
patient trt sex race age
Make a summary table based on this dataset:
Trt1 Trt2
Sex M X X
F X X
Age
N X X
Mean X.X X.X
Median X X
Min X X
Max X X
Race
White X X
Asian X X
Black X X
Instead of an X you'd have an instance variable
@[:frequency]['Placebo']['M'] for instance
a nested hash
@sex{frequency=>{'Placebo'=>{''M'=>'17','F'='21'}}}
erb embeding ruby in HTML
require 'erb'
File.open('Template.txt',r) do t
File open ('output.html',w) do f
file puts ERB.new(t.readlines..to_s).results(o.get_binding)
end
end
o can be any object
get_binding is a method added to the object o
because they are seperate you get a lot more power
to get a new output file you just change the template
object o
An output class, it takes an array.
A bunch of ojects for accessor variables that you are interested in.
class Output
def initialize(stdm_dat)
@sdtm_data=sdtm_data
end
This is probably a fudge, you would probably use activerecord to make an object right out of the database.
def frequency_by_trtgrp(*args)
args.each do arg
instance_eval("@#{arg}=Hash.new") unless instance_variable.defined
Create instance variables for everything you are passing through
Which is the same as saying:
@sex=Hash.new
You should probably check to make sure this instance variable doesn't exist because it would overwrite and blow away what you already. Hence the unless statement.
instance_eval("2+2") will resolve the variable in the string, for instance and execute the code
@sex[:frequency]=Hash.new{h,kh[k]=Hash.new (&h.default_proc) }
Will give you as many level of nested hashes as you need.
Passing in a string to the ruby compiler at run time. Java and C you can't do this.
The disadvantage is that the innermost hash has no default value.
@sex[:frequency]['Placebo']['Sex']+=1
So Glen set everything to zero.
class Output
@Sdtm_data.each do e
instance_eval("@#{arg}[:Frequency][@trtgrp][@#(arg}]=0")
end
self
end
You add the self so the return value is the oject returned.
o=output.new
o.freq_by_trtgrp(:race,:sex).desc_stats
Glenns spoke about method missing.
You end up with a small set of methods you can use to construct a wide variety of
0.statistics_on_sex_and rave_by_trtgrp
You don't need parameters because the method name is going to give you that information.
@sex[:Frequency]['Placebo']['M']
Glen is rethinking the above approach as maybe not the best way to do it.
@sex['Placebo']=SimpleStatistics.new
@sex['Placebo'] << e.sex
The output object now knows nothing about statistics.
This changes the template:
<%=@sex['Placebo].freq('m')
Welcome Chris and David
erb
RubyRX
Logging
Next Week
More RubyRx
Week after
IO
Sample db table
patient trt sex race age
Make a summary table based on this dataset:
Trt1 Trt2
Sex M X X
F X X
Age
N X X
Mean X.X X.X
Median X X
Min X X
Max X X
Race
White X X
Asian X X
Black X X
Instead of an X you'd have an instance variable
@[:frequency]['Placebo']['M'] for instance
a nested hash
@sex{frequency=>{'Placebo'=>{''M'=>'17','F'='21'}}}
erb embeding ruby in HTML
require 'erb'
File.open('Template.txt',r) do t
File open ('output.html',w) do f
file puts ERB.new(t.readlines..to_s).results(o.get_binding)
end
end
o can be any object
get_binding is a method added to the object o
because they are seperate you get a lot more power
to get a new output file you just change the template
object o
An output class, it takes an array.
A bunch of ojects for accessor variables that you are interested in.
class Output
def initialize(stdm_dat)
@sdtm_data=sdtm_data
end
This is probably a fudge, you would probably use activerecord to make an object right out of the database.
def frequency_by_trtgrp(*args)
args.each do arg
instance_eval("@#{arg}=Hash.new") unless instance_variable.defined
Create instance variables for everything you are passing through
Which is the same as saying:
@sex=Hash.new
You should probably check to make sure this instance variable doesn't exist because it would overwrite and blow away what you already. Hence the unless statement.
instance_eval("2+2") will resolve the variable in the string, for instance and execute the code
@sex[:frequency]=Hash.new{h,kh[k]=Hash.new (&h.default_proc) }
Will give you as many level of nested hashes as you need.
Passing in a string to the ruby compiler at run time. Java and C you can't do this.
The disadvantage is that the innermost hash has no default value.
@sex[:frequency]['Placebo']['Sex']+=1
So Glen set everything to zero.
class Output
@Sdtm_data.each do e
instance_eval("@#{arg}[:Frequency][@trtgrp][@#(arg}]=0")
end
self
end
You add the self so the return value is the oject returned.
o=output.new
o.freq_by_trtgrp(:race,:sex).desc_stats
Glenns spoke about method missing.
You end up with a small set of methods you can use to construct a wide variety of
0.statistics_on_sex_and rave_by_trtgrp
You don't need parameters because the method name is going to give you that information.
@sex[:Frequency]['Placebo']['M']
Glen is rethinking the above approach as maybe not the best way to do it.
@sex['Placebo']=SimpleStatistics.new
@sex['Placebo'] << e.sex
The output object now knows nothing about statistics.
This changes the template:
<%=@sex['Placebo].freq('m')
Tonight's Session in On
Hi All,
We are on for the Ruby session tonight at 4:30 in the Rotary Conference Room in the basement of the Central Square Library (Pearl Street). The session should go to 6 or maybe 6:15.
For tonight's class, we are going to go over logging and the RubyRx clinical programming framework.
We are going to have a couple of new faces tonight, including a Ruby developer that I met at the November Voices That Matter conference that I told you about.
Hope to see you all there!
Glenn
We are on for the Ruby session tonight at 4:30 in the Rotary Conference Room in the basement of the Central Square Library (Pearl Street). The session should go to 6 or maybe 6:15.
For tonight's class, we are going to go over logging and the RubyRx clinical programming framework.
We are going to have a couple of new faces tonight, including a Ruby developer that I met at the November Voices That Matter conference that I told you about.
Hope to see you all there!
Glenn
Sunday, January 11, 2009
Improving Version of RubyRx
Hi,
I fixed a bug in the previous version (the nested_hash method was not working properly, so I took it out and solved the problem another way).
I also added method_missing and turned the freq_by_trtgrp method into the far more dynamic freq method in conjunction with method_missing. The code now can take a frequency on any valid variable.
Below is the updated version of the code. I am looking forward to reviewing this in the class on Tuesday the 13th.
class DM
attr_accessor :sex, :race, :brthdtc, :trtgrp, :age
def initialize(obj)
@sex = obj[:sex]
@race = obj[:race]
@brthdtc = obj[:brthdtc]
@trtgrp = obj[:trtgrp]
@age = obj[:age]
end
end
class Output
attr_accessor :sex, :race, :age, :trtgrp
def initialize(sdtm_data)
@sdtm_data = sdtm_data
end
# Support templating of member data.
def get_binding
binding
end
def freq(by_var_list, *args)
by_var_str = ""
by_vars = by_var_list.split(/_and_/i)
if by_vars.length > 0
by_vars.each do |var|
by_var_str << "[e.#{var}]"
end
end
args.each do |arg|
instance_eval("@#{arg} = Hash.new unless instance_variable_defined?(:@#{arg})")
instance_eval("@#{arg}[:frequency] = Hash.new{|h,k| h[k] = Hash.new(&h.default_proc)} unless @#{arg}.keys.include?(:frequency)")
@sdtm_data.a.each do |e|
instance_eval("@#{arg}[:frequency]#{by_var_str}[e.#{arg}] = 0")
end
@sdtm_data.a.each do |e|
instance_eval("@#{arg}[:frequency]#{by_var_str}[e.#{arg}] += 1")
end
end
return self
end
def method_missing(sym, *args)
sym.to_s =~ /^(.*)_by_(.*)/i
method_name = $1
by_var_list = $2
instance_eval("#{method_name}('#{by_var_list}', :#{args.join(', :')})")
end
end
class DM_Table
attr_accessor :a
def initialize
@a = []
end
def << (obj)
@a << obj
end
end
dt = DM_Table.new
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'BLACK', :brthdtc => '1968-10-01', :age => 52})
dt << trtgrp =""> 'Drug', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04', :age => 17})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01', :age => 88})
dt << trtgrp =""> 'Drug', :sex => 'M', :race => 'ASIAN', :brthdtc => '1970-07-04', :age => 12})
dt << trtgrp =""> 'Drug', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04', :age => 21})
dt << trtgrp =""> 'Drug', :sex => 'M', :race => 'WHITE', :brthdtc => '1970-07-04', :age => 33})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'WHITE', :brthdtc => '1968-10-01', :age => 17})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'BLACK', :brthdtc => '1968-10-01', :age => 12})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01', :age => 64})
dt << trtgrp =""> 'Drug', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04', :age => 71})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01', :age => 76})
dt << trtgrp =""> 'Drug', :sex => 'M', :race => 'ASIAN', :brthdtc => '1970-07-04', :age => 67})
dt << trtgrp =""> 'Drug', :sex => 'F', :race => 'WHITE', :brthdtc => '1970-07-04', :age => 6})
dt << trtgrp =""> 'Drug', :sex => 'M', :race => 'WHITE', :brthdtc => '1970-07-04', :age => 13})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'WHITE', :brthdtc => '1968-10-01', :age => 18})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'BLACK', :brthdtc => '1968-10-01', :age => 81})
begin
o = Output.new(dt)
o.freq('', :trtgrp).freq_by_trtgrp(:age, :sex, :race)
rescue => e
puts e
end
require "erb"
File.open('T14.2-Demo-Template.txt', 'r') do |t|
File.open('T14.2-Demo.rhtml', 'w') do |f|
f.puts ERB.new(t.readlines.to_s).result(o.get_binding)
end
end
This works in conjunction with the following text file template, called T14.2-Demo-Template.txt in the above code:
ACME Pharmaceuticals
Study XXXXXXX
Demographics Table
Placebo Drug
n=<%=@trtgrp[:frequency]['Placebo']%> n=<%=@trtgrp[:frequency]['Drug']%>
Sex
M <%=@sex[:frequency]['Placebo']['M']%> <%=@sex[:frequency]['Drug']['M']%>
F <%=@sex[:frequency]['Placebo']['F']%> <%=@sex[:frequency]['Drug']['F']%>
Race
White <%=@race[:frequency]['Placebo']['WHITE']%> <%=@race[:frequency]['Drug']['WHITE']%>
Black <%=@race[:frequency]['Placebo']['BLACK']%> <%=@race[:frequency]['Drug']['BLACK']%>
Asian <%=@race[:frequency]['Placebo']['ASIAN']%> <%=@race[:frequency]['Drug']['ASIAN']%>
Age
In the previous version of the code, the output was written to the console window. Now it is written to an output file, and looks like this:
ACME Pharmaceuticals
Study XXXXXXX
Demographics Table
Placebo Drug
n=8 n=8
Sex
M 5 4
F 3 4
Race
White 5 3
Black 3
Asian 5
Age
I fixed a bug in the previous version (the nested_hash method was not working properly, so I took it out and solved the problem another way).
I also added method_missing and turned the freq_by_trtgrp method into the far more dynamic freq method in conjunction with method_missing. The code now can take a frequency on any valid variable.
Below is the updated version of the code. I am looking forward to reviewing this in the class on Tuesday the 13th.
class DM
attr_accessor :sex, :race, :brthdtc, :trtgrp, :age
def initialize(obj)
@sex = obj[:sex]
@race = obj[:race]
@brthdtc = obj[:brthdtc]
@trtgrp = obj[:trtgrp]
@age = obj[:age]
end
end
class Output
attr_accessor :sex, :race, :age, :trtgrp
def initialize(sdtm_data)
@sdtm_data = sdtm_data
end
# Support templating of member data.
def get_binding
binding
end
def freq(by_var_list, *args)
by_var_str = ""
by_vars = by_var_list.split(/_and_/i)
if by_vars.length > 0
by_vars.each do |var|
by_var_str << "[e.#{var}]"
end
end
args.each do |arg|
instance_eval("@#{arg} = Hash.new unless instance_variable_defined?(:@#{arg})")
instance_eval("@#{arg}[:frequency] = Hash.new{|h,k| h[k] = Hash.new(&h.default_proc)} unless @#{arg}.keys.include?(:frequency)")
@sdtm_data.a.each do |e|
instance_eval("@#{arg}[:frequency]#{by_var_str}[e.#{arg}] = 0")
end
@sdtm_data.a.each do |e|
instance_eval("@#{arg}[:frequency]#{by_var_str}[e.#{arg}] += 1")
end
end
return self
end
def method_missing(sym, *args)
sym.to_s =~ /^(.*)_by_(.*)/i
method_name = $1
by_var_list = $2
instance_eval("#{method_name}('#{by_var_list}', :#{args.join(', :')})")
end
end
class DM_Table
attr_accessor :a
def initialize
@a = []
end
def << (obj)
@a << obj
end
end
dt = DM_Table.new
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'BLACK', :brthdtc => '1968-10-01', :age => 52})
dt << trtgrp =""> 'Drug', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04', :age => 17})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01', :age => 88})
dt << trtgrp =""> 'Drug', :sex => 'M', :race => 'ASIAN', :brthdtc => '1970-07-04', :age => 12})
dt << trtgrp =""> 'Drug', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04', :age => 21})
dt << trtgrp =""> 'Drug', :sex => 'M', :race => 'WHITE', :brthdtc => '1970-07-04', :age => 33})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'WHITE', :brthdtc => '1968-10-01', :age => 17})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'BLACK', :brthdtc => '1968-10-01', :age => 12})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01', :age => 64})
dt << trtgrp =""> 'Drug', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04', :age => 71})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01', :age => 76})
dt << trtgrp =""> 'Drug', :sex => 'M', :race => 'ASIAN', :brthdtc => '1970-07-04', :age => 67})
dt << trtgrp =""> 'Drug', :sex => 'F', :race => 'WHITE', :brthdtc => '1970-07-04', :age => 6})
dt << trtgrp =""> 'Drug', :sex => 'M', :race => 'WHITE', :brthdtc => '1970-07-04', :age => 13})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'WHITE', :brthdtc => '1968-10-01', :age => 18})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'BLACK', :brthdtc => '1968-10-01', :age => 81})
begin
o = Output.new(dt)
o.freq('', :trtgrp).freq_by_trtgrp(:age, :sex, :race)
rescue => e
puts e
end
require "erb"
File.open('T14.2-Demo-Template.txt', 'r') do |t|
File.open('T14.2-Demo.rhtml', 'w') do |f|
f.puts ERB.new(t.readlines.to_s).result(o.get_binding)
end
end
This works in conjunction with the following text file template, called T14.2-Demo-Template.txt in the above code:
ACME Pharmaceuticals
Study XXXXXXX
Demographics Table
Placebo Drug
n=<%=@trtgrp[:frequency]['Placebo']%> n=<%=@trtgrp[:frequency]['Drug']%>
Sex
M <%=@sex[:frequency]['Placebo']['M']%> <%=@sex[:frequency]['Drug']['M']%>
F <%=@sex[:frequency]['Placebo']['F']%> <%=@sex[:frequency]['Drug']['F']%>
Race
White <%=@race[:frequency]['Placebo']['WHITE']%> <%=@race[:frequency]['Drug']['WHITE']%>
Black <%=@race[:frequency]['Placebo']['BLACK']%> <%=@race[:frequency]['Drug']['BLACK']%>
Asian <%=@race[:frequency]['Placebo']['ASIAN']%> <%=@race[:frequency]['Drug']['ASIAN']%>
Age
In the previous version of the code, the output was written to the console window. Now it is written to an output file, and looks like this:
ACME Pharmaceuticals
Study XXXXXXX
Demographics Table
Placebo Drug
n=8 n=8
Sex
M 5 4
F 3 4
Race
White 5 3
Black 3
Asian 5
Age
Thursday, January 8, 2009
Very draft version of the RubyRx clinical programing language
Thanks to John for posting those notes from the latest session.
Bellow is a rough draft of the RubyRx Domain Specific Language (DSL). RubyRx is a language that will be used to analyze data collected during clinical trial of pharmaceutical drugs and medical devices.
This will certainly look very familiar to the folks who joined us for the session on January 6th. I have made some changes to simplify and clarify the code.
The basic idea is that we create a class that has a single array instance variable, plus a method to push elements onto that array and a method to access the array from outside the object.
Then I instantiate an object from that class, and push a bunch of hashes into the object's array instance variable. The data in the hashes are simple versions of the rows of data that we typically find in demographics data collected during clinical trials. Each hash would be equivalent to a row in a data set (or a database table) and the key-value pairs in the hash are equivalent to the variables in a data set or database table. (By the way, this is just my quick and dirty way of creating some data to work with -- all of this will shortly be replaced by database tables, and ActiveRecord classes and methods.)
The next part is an Output class. In the Output class, the initialize method provides the facility to create an array instance variable of the object created above (and that object's hash with the demographics data). The Output class also has a freq_by_trtgrp method. This method accepts an array of symbols as its parameter and produces a variable for each of of the symbols in the parameter. For example, if the parameter is race and sex, instantiated objects of the Output class will have two instance variables (@race and @sex). These instance variables will be nested hashes 3 levels deep. The levels will be the type of statistic requested (:frequency, in this case), the dose group (say, 'Placebo'), and the values of race ('WHITE', 'ASIAN' and so forth), each with a corresponding integer of how often that particular value of race occurred. For example:
p @race # {:frequency => {'Placebo' => {'WHITE' => 12, 'ASIAN' => 7, 'BLACK' => 17}, '0.2 mg' => {'ASIAN' => 10, 'BLACK' => 3, 'WHITE' => 22}}}
The idea is to use this nested hash (along with similar ones for sex, age, BMI and so forth) to create an output summary table for demographics. I will write more on that in a later post.
Here's the code:
t0 = Time.new
def nested_hash(levels, &def_proc)
inner_hash = def_proc ? Hash.new(&def_proc) : Hash.new(0)
if levels == 1
inner_hash
else
l = [lambda {|h,k| h[k] = inner_hash}]
(levels-2).times do |i|
l[i+1] = lambda {|h,k| h[k] = Hash.new( &l[i] ) }
end
Hash.new(&l[-1])
end
end
class DM
attr_accessor :sex, :race, :brthdtc, :trtgrp
def initialize(obj)
@sex = obj[:sex]
@race = obj[:race]
@brthdtc = obj[:brthdtc]
@trtgrp = obj[:trtgrp]
end
end
class Output
attr_accessor :sex, :race, :brthdtc, :trtgrp
def initialize(sdtm_data)
@sdtm_data = sdtm_data
end
def freq_by_trtgrp(*args)
args.each do |arg|
instance_eval("@#{arg} = {}")
instance_eval("@#{arg}[:frequency] = nested_hash(2)")
@sdtm_data.a.each do |e|
instance_eval("@#{arg}[:frequency][e.trtgrp][e.#{arg}] += 1")
end
end
return self
end
end
class DM_Table
attr_accessor :a
def initialize
@a = []
end
def << (obj)
@a << obj
end
end
dt = DM_Table.new
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'BLACK', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'VELCADE', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'VELCADE', :sex => 'M', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'VELCADE', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'VELCADE', :sex => 'M', :race => 'WHITE', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'WHITE', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'BLACK', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'VELCADE', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'VELCADE', :sex => 'M', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'VELCADE', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'VELCADE', :sex => 'M', :race => 'WHITE', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'WHITE', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'BLACK', :brthdtc => '1968-10-01'})
begin
o = Output.new(dt)
o.freq_by_trtgrp(:sex, :race)
p o.sex
p o.race
rescue => e
puts e
ensure
puts "Elapsed time: #{ Time.new - t0 }"
end
The output of executing the above code is:
Pleas let me know if there are any questions or comments on this code.
Thanks,
Glenn
Bellow is a rough draft of the RubyRx Domain Specific Language (DSL). RubyRx is a language that will be used to analyze data collected during clinical trial of pharmaceutical drugs and medical devices.
This will certainly look very familiar to the folks who joined us for the session on January 6th. I have made some changes to simplify and clarify the code.
The basic idea is that we create a class that has a single array instance variable, plus a method to push elements onto that array and a method to access the array from outside the object.
Then I instantiate an object from that class, and push a bunch of hashes into the object's array instance variable. The data in the hashes are simple versions of the rows of data that we typically find in demographics data collected during clinical trials. Each hash would be equivalent to a row in a data set (or a database table) and the key-value pairs in the hash are equivalent to the variables in a data set or database table. (By the way, this is just my quick and dirty way of creating some data to work with -- all of this will shortly be replaced by database tables, and ActiveRecord classes and methods.)
The next part is an Output class. In the Output class, the initialize method provides the facility to create an array instance variable of the object created above (and that object's hash with the demographics data). The Output class also has a freq_by_trtgrp method. This method accepts an array of symbols as its parameter and produces a variable for each of of the symbols in the parameter. For example, if the parameter is race and sex, instantiated objects of the Output class will have two instance variables (@race and @sex). These instance variables will be nested hashes 3 levels deep. The levels will be the type of statistic requested (:frequency, in this case), the dose group (say, 'Placebo'), and the values of race ('WHITE', 'ASIAN' and so forth), each with a corresponding integer of how often that particular value of race occurred. For example:
p @race # {:frequency => {'Placebo' => {'WHITE' => 12, 'ASIAN' => 7, 'BLACK' => 17}, '0.2 mg' => {'ASIAN' => 10, 'BLACK' => 3, 'WHITE' => 22}}}
The idea is to use this nested hash (along with similar ones for sex, age, BMI and so forth) to create an output summary table for demographics. I will write more on that in a later post.
Here's the code:
t0 = Time.new
def nested_hash(levels, &def_proc)
inner_hash = def_proc ? Hash.new(&def_proc) : Hash.new(0)
if levels == 1
inner_hash
else
l = [lambda {|h,k| h[k] = inner_hash}]
(levels-2).times do |i|
l[i+1] = lambda {|h,k| h[k] = Hash.new( &l[i] ) }
end
Hash.new(&l[-1])
end
end
class DM
attr_accessor :sex, :race, :brthdtc, :trtgrp
def initialize(obj)
@sex = obj[:sex]
@race = obj[:race]
@brthdtc = obj[:brthdtc]
@trtgrp = obj[:trtgrp]
end
end
class Output
attr_accessor :sex, :race, :brthdtc, :trtgrp
def initialize(sdtm_data)
@sdtm_data = sdtm_data
end
def freq_by_trtgrp(*args)
args.each do |arg|
instance_eval("@#{arg} = {}")
instance_eval("@#{arg}[:frequency] = nested_hash(2)")
@sdtm_data.a.each do |e|
instance_eval("@#{arg}[:frequency][e.trtgrp][e.#{arg}] += 1")
end
end
return self
end
end
class DM_Table
attr_accessor :a
def initialize
@a = []
end
def << (obj)
@a << obj
end
end
dt = DM_Table.new
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'BLACK', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'VELCADE', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'VELCADE', :sex => 'M', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'VELCADE', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'VELCADE', :sex => 'M', :race => 'WHITE', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'WHITE', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'BLACK', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'VELCADE', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'Placebo', :sex => 'M', :race => 'WHITE', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'VELCADE', :sex => 'M', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'VELCADE', :sex => 'F', :race => 'ASIAN', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'VELCADE', :sex => 'M', :race => 'WHITE', :brthdtc => '1970-07-04'})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'WHITE', :brthdtc => '1968-10-01'})
dt << trtgrp =""> 'Placebo', :sex => 'F', :race => 'BLACK', :brthdtc => '1968-10-01'})
begin
o = Output.new(dt)
o.freq_by_trtgrp(:sex, :race)
p o.sex
p o.race
rescue => e
puts e
ensure
puts "Elapsed time: #{ Time.new - t0 }"
end
The output of executing the above code is:
{:frequency=>{"VELCADE"=>{"M"=>9, "F"=>7}, "Placebo"=>{"M"=>9, "F"=>7}}}
{:frequency=>{"VELCADE"=>{"ASIAN"=>6, "WHITE"=>7, "BLACK"=>3}, "Placebo"=>{"ASIAN"=>6, "WHITE"=>7, "BLACK"=>3}}}
Elapsed time: 0.001752
Pleas let me know if there are any questions or comments on this code.
Thanks,
Glenn
Tuesday, January 6, 2009
06Jan2008
Agenda:
New Venue
RubyRX
DabbleDB.com
New week
YW Logging
More RubyRX
Future
IO JG
Respect Test Unit
Method Missing
We are getting booted out of the library for a month. Glen will seek a new venue.
Check out DabbleDB.com you can up load your own spreadsheets and create forms and make tables and figures.
RubyRx
Demographics
Tables are a bunch of columns and a bunch of rows.
Columns are often dose groups and total.
Rows are often Race, Age, Height, Weight
The way to do it in SAS is to add variables, manipulate the data, and transpose. It can be a lot of work. It might be simpler to do it with objects.
Nested Hashes
Ruby has a feature called Embedded Ruby.
ERB allows you to insert ruby syntax into HTML. The variable resolves in the HTML.
You can also put Ruby (loops etc) code into HTML.
An object creates the frequency then passes it into an HTML template.
Demogarphics Trt1 Trt2 Trt3 Trt4
Age x x x
Sex x x x
Race x x x
Hashes are similar to array, curly braces,
h=hash.new
h={}
You always need a key in a hash {'x'=>1} or {'x'=>[]}, {'a'=>{}}
Disavantage is you can't sort but you can pull out/put in data out a a hash quickly.
Sex Placebo
Male @sex['Placebo']['M']
Female
{"Sex=>{"Placebo"=>{"M"=>X,"F"=>X}},"TRT1"=>{"M"=>Y,"F"=>Y}}}
If this was on Rails these hashes would collapse to the value as the hashes resolve.
Adverse events might be a little bit more complicated, because the number of rows will be dynamic, where as in demograhics you already know.
Class DM_Table
def <<(obj) #@a<"M","trtgrp"=>"Placebo"}
create an output class
class Output
def initialize(sdtm_data)
@sdtm_data=sdtm_data
end
end
#You'd use it like this
#0=Output.new(dt)
#Create an object that contains my demographic data
#So you make a object that does the frequencies:
def freq_by_trtgrp(*args)
args.each do arg
0.freq_by_trtgroup(:sex,:trtgrp)
:sex and :trtgrp are symbols
*args allows you to pass a parameter of any size
and it will put it into an array
# we need to create a hash in a funky way
#Glenn refuses to explain why :)
# but he got a bunch of code that defines this method
h=nested_hash(3)
#h[:a][:b][:c] +=1;
#Allows you fill all three levels of the hash at once at create the default value
#You'd use it like this:
h['Frequency']['Placebo']['M']+=1
{:frequency=>{"Placebo"=>{"M"=>}}}
Now you iterate through the instance variable
#stdm data is an array
@stdm_data. each do e
h[:frequency][@trtgrp][e.instance_variable_get("@#{arg}")]+=1
end
#inject the h into the object
self.add_instance_variable(:"#arg)",h)
end
#go into the object a grab the instance variable that I am refering to in quotes
New Venue
RubyRX
DabbleDB.com
New week
YW Logging
More RubyRX
Future
IO JG
Respect Test Unit
Method Missing
We are getting booted out of the library for a month. Glen will seek a new venue.
Check out DabbleDB.com you can up load your own spreadsheets and create forms and make tables and figures.
RubyRx
Demographics
Tables are a bunch of columns and a bunch of rows.
Columns are often dose groups and total.
Rows are often Race, Age, Height, Weight
The way to do it in SAS is to add variables, manipulate the data, and transpose. It can be a lot of work. It might be simpler to do it with objects.
Nested Hashes
Ruby has a feature called Embedded Ruby.
ERB allows you to insert ruby syntax into HTML. The variable resolves in the HTML.
You can also put Ruby (loops etc) code into HTML.
An object creates the frequency then passes it into an HTML template.
Demogarphics Trt1 Trt2 Trt3 Trt4
Age x x x
Sex x x x
Race x x x
Hashes are similar to array, curly braces,
h=hash.new
h={}
You always need a key in a hash {'x'=>1} or {'x'=>[]}, {'a'=>{}}
Disavantage is you can't sort but you can pull out/put in data out a a hash quickly.
Sex Placebo
Male @sex['Placebo']['M']
Female
{"Sex=>{"Placebo"=>{"M"=>X,"F"=>X}},"TRT1"=>{"M"=>Y,"F"=>Y}}}
If this was on Rails these hashes would collapse to the value as the hashes resolve.
Adverse events might be a little bit more complicated, because the number of rows will be dynamic, where as in demograhics you already know.
Class DM_Table
def <<(obj) #@a<
create an output class
class Output
def initialize(sdtm_data)
@sdtm_data=sdtm_data
end
end
#You'd use it like this
#0=Output.new(dt)
#Create an object that contains my demographic data
#So you make a object that does the frequencies:
def freq_by_trtgrp(*args)
args.each do arg
0.freq_by_trtgroup(:sex,:trtgrp)
:sex and :trtgrp are symbols
*args allows you to pass a parameter of any size
and it will put it into an array
# we need to create a hash in a funky way
#Glenn refuses to explain why :)
# but he got a bunch of code that defines this method
h=nested_hash(3)
#h[:a][:b][:c] +=1;
#Allows you fill all three levels of the hash at once at create the default value
#You'd use it like this:
h['Frequency']['Placebo']['M']+=1
{:frequency=>{"Placebo"=>{"M"=>}}}
Now you iterate through the instance variable
#stdm data is an array
@stdm_data. each do e
h[:frequency][@trtgrp][e.instance_variable_get("@#{arg}")]+=1
end
#inject the h into the object
self.add_instance_variable(:"#arg)",h)
end
#go into the object a grab the instance variable that I am refering to in quotes
Sunday, January 4, 2009
Class for January 6th
Hi,
We are on for class on Tuesday January 6th at the usual time and place (4:30 at the Central Square Library).
Let's spend the class reviewing what we've learned so far. We will also introduce the logging facility in Ruby and go over the clinical programming framework that I am calling RubyRx. I want to show you some ideas that I have for inserting the results of a frequency method into a Clinical Study Report (CSR) template.
Hope to see you all there.
Glenn
We are on for class on Tuesday January 6th at the usual time and place (4:30 at the Central Square Library).
Let's spend the class reviewing what we've learned so far. We will also introduce the logging facility in Ruby and go over the clinical programming framework that I am calling RubyRx. I want to show you some ideas that I have for inserting the results of a frequency method into a Clinical Study Report (CSR) template.
Hope to see you all there.
Glenn
Subscribe to:
Comments (Atom)