Problem Set 3

  • Released January 26, 2009
  • Due 5:00PM (Pacific Time) - February 2, 2009

Introduction

This problem set will be an exercise in integrating molecular measurements with clinical measurements, focusing on the study of chronic renal (kidney) failure.

You will be using a set of de-identified electronic medical record data obtained from a hospital (not Stanford).

  • If you are using our cluster (bmi217compute) we have set up a MySQL database (bmi217_pset3) with all the data that you will need for this problem set. Use username:student password:student in order to log in.
  • Your implementation should be a combination of R and MySQL.

There are several tables that you will need.

  • pat_test_histv: Clinical laboratory test results. Each row represents a single laboratory measurement on a single patient. PAT_NUM holds the patient identifiers. EVENT_START_DT_TM holds the times and dates each test was obtained. TEST_ABBR holds the names of each test. RSLT_VAL holds the actual measurements. RSLT_UNIT_TXT are the units for each measurement. REF_LOW_VAL and REF_HIGH_VAL indicate the low and high end of the normal range for each measurement.
  • pharmacy: Inpatient pharmacy drug records. Each row represents a single drug order on a single patient. PATIENT_RECORD_NUMBER holds the patient identifiers. DATE_OF_SERVICE holds the date each drug was used. MEDICATION_NAME describes the drug. TOTAL_PRICE indicates the amount charged.
  • pat_fin_acct: Hospital charges billed for each patient, by encounter. Each row represents a single set of charges for a single patient, for a single hospital encounter. PAT_NUM holds the patient identifiers. PRNCPL_PROCDR_CD is the principal procedure (CPT) code for the encounter. PRNCPL_DIAG_CD is the principal diagnosis (ICD) code for the encounter. ADMT_DIAG_CD is the original admitting diagnosis (ICD) code, when available. DISCH_DT_TM holds the date and time of discharge.
  • icd9: A reference table that provides a readable definition for each ICD9 diagnosis code.
  • cpt_code: A reference table that provides a readable definition for each CPT code.

Look here to learn more about ICD-9 codes. Look here to learn more about CPT codes. The key (pun intended) to joining the three main tables lies with the patient identifiers.

Questions

1. Using these tables, list the patient numbers for those patients with any encounter during which a diagnosis was made of Chronic Renal Failure (CHRONIC RENAL FAILURE), Status/Post (in medical-speak, “status/post” means “previous”) Kidney Transplantation (KIDNEY TRANSPLANT STATUS), or Complications of Kidney Transplantation (COMPL KIDNEY TRANSPLANT). (5 pts)

Despite these three seemingly separate diagnoses, we will collectively consider this single list of patients as patients with chronic renal failure for the rest of this assignment.

2. After listing these patient numbers, indicate which of these patients has clinical laboratory data available. (5 pts)

One marker for kidney function is serum creatinine (abbreviated “CR”). Rising serum creatinine is an indication of worsening renal failure.

3. Calculate the average creatinine for each patient with chronic renal failure, across all time points and measurements, and list these in a table. (5 pts)

The kidneys have many functions besides filtering the blood, including regulating blood pressure and secreting erythropoietin, which stimulates the bone marrow to make red blood cells. Chronic renal failure is associated with a decrease in the amount of secreted erythropoietin, resulting in anemia. The amount of red cells in the blood is measured by Red Blood Cell count (abbreviated “RBC”), while the concentration of red blood cells in the blood is measured by Hematocrit (abbreviated “HCT.”). Each red blood cell contains hemoglobin, to assist in carrying oxygen from the lungs to other tissues. The amount of hemoglobin in the blood is measured, appropriately, by Hemoglobin (abbreviated “HGB”).

4. Calculate the average hemoglobin for each patient with chronic renal failure, across all time points and measurements. (5 pts)

5. For every other patient in the database, calculate each’s mean hemoglobin across all time points and measurements. What is the mean of these mean hemoglobin measurements? (5 pts)

6. Are mean hemoglobin measurements significantly lower in patients with chronic renal failure than in all other patients in the database? Determine this using a t-test, and indicate your finding and the p-value. Draw a box-plot illustrating your finding. (5 pts)

As chronic renal failure progresses, one might expect both the filtering ability of the kidney and the erythropoietin secreting ability of the kidney to worsen together. To test this hypothesis, we can compare the mean hemoglobin and mean creatinine across our list of patients with chronic renal failure.

7. Do these correlate with each other? Test using Spearman’s rank correlation, as we have few values which may not resemble a normal distribution. Is this a statistically significant correlation? (5 pts)

We can test this hypothetical relationship using all the other patients in the database without chronic renal failure.

8. For every other patient in the database, calculate each patient's mean creatinine and mean hemoglobin across all time points and measurements, then compare them using Spearman’s rank correlation. Is this a statistically significant correlation (i.e. a significant p-value indicating a statistically significantly non-zero correlation)? What is the interpretation of this statistic compared to the statistic for the comparison calculated above for the patients with chronic renal failure? (5 pts)

The drug erythropoietin is used to treat the anemia in chronic renal failure.

9. Compare the mean hemoglobin in patients with chronic renal failure having been treated with erythropoietin (brand name “Epoetin”) even once, regardless of formulation or dose, versus the mean hemoglobin in patients with chronic renal failure never having been treated with erythropoietin. Perform this analysis using a t-test. Is there a significant difference in average hemoglobin in those patients treated with erythropoietin? What is your interpretation? (5 pts)

In this way, you have tested an association between the laboratory measurement of hemoglobin and the drug erythropoietin. This kind of association can be generalized. For every drug formulation in the database, there is a set of patients who were on the drug even once, and a set of patients who were never on the drug. Since every patient has a number of laboratory tests performed, every single type of laboratory measure can be evaluated to determine whether its measurements are statistically significantly different between the set of patients on the drug versus those not on the drug (t-test).

10. Perform the following evaluation, across every drug, then for each drug, across every laboratory measurement. For each pairing of drug and laboratory measurement, evaluate whether there is a statistically significant t-test association between laboratory measurement and use/non-use of the drug. Report on the top five strongest associations (lowest p-values) you find between a drug and a laboratory measurement, in terms of lowest p-value. (20 pts) Don’t bother running a t-test when the number of patients on or off a drug are under 4. Warning: this will be an extensive computation.

11. Though you have listed the top five strongest associations, you actually now have a comprehensive set of drug-laboratory associations, each with a t-test p-value. Now, represent the significant relationships (arbitrarily p < 0.05) graphically. To do this, you will want to use the yEd Graph Editor, a free Java-based graph layout tool, which can read files in GRAPHML (XML) format. More information about this format is available here. First, get the yEd program working. Second, download this GRAPHML file, described here, and load it into yEd. Notice how all the nodes will be on top of each other. Select Organic/Classic from the Layout menu, and click Ok, and you will see how all the nodes and edges layout in a pleasing way. The GRAPHML format is not particularly hard to output, even from R. Into a GRAPHML file, output all the significant relationships (arbitrarily p < 0.05) as edges, and the laboratory measures and drugs as nodes (different colors for these might be nice), then lay these out using yEd. Include the output in what you return to us. (10 pts)

EXTRA CREDIT: Consider the total price of each drug in the database, and calculate the total amount spent on drugs for each patient. For each diagnosis code, find the patients having that diagnosis at least once, sum the amount spent on those patients on drugs, then divide this sum by the total number of patients with that diagnosis. Report on top five diagnoses with the highest cost for drugs per patient. (10 pts)

Submission

Three steps:

1. Create a directory containing the following files:

  • All code/scripts. We need be able to run your code! Make sure we will be able to do so.
  • A file called “readme.txt” explaining your technical code details. Write down exactly how to run your code. If you used libraries that we should install, note them here.
  • A PDF file called “ps3.pdf”, which is a summary of your approach (1 page max, 12pt Arial font, single spaced) and answers to our questions. This is where you explain your work, which is important for assigning partial credit. Free tools exist for converting Word documents to PDF like http://www.zamzar.com/

2. Zip the directory into one file called ps3_your_sunet_id.zip.

3. Email the ps3_your_sunet_id.zip file to bmi217submit@gmail.com.

Grade breakdown

You will need to explain your work.

  • 20 pts for well-commented working code. You can get partial credit for partially working code or non-working code that is well-commented.
  • 5 pts for readme.txt that clearly, concisely describes how to run your code.
  • 75 pts for ps3.pdf, for a clear consise summary of what you did and answers to the questions (75 pts)

Collaboration policy

You can talk with others in the class about this problem set, but you must turn in your own individual work and may not compare answers.

 
public/pset3_win0809.txt · Last modified: 2009/01/26 15:57 by ecoronap
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki