当前位置: 动力学知识库 > 问答 > 编程问答 >

r - How to write a function which can count individuals once even with multiple matches across columns

问题描述:

I have a data set which has patient diagnostic (ICD-9) codes, which can have a length between 3-5 digits, where the first three digits represent a classification of diagnosis, and the 4th and 5th represent a further refinement of the classification. For example:

zz<-" dx1 dx2 dx3

1 64251 82381 8100

2 8052 8730 51881

3 64421 431 81601

4 3041 29690 9920

5 72888 8782 59080

6 7245 60886 8479

7 291 4659 4739

8 30410 30400 95901

9 2929 30500 8208

10 7840 6268 8052"

df<-read.table(text=zz, header=TRUE)

Each row of codes represents multiple diagnoses of the same individual. I have written a series of ifelse statements to create a new variable with the codes Im interested in so they are mapped to numbers representing different diagnoses of interest:

df$x<-ifelse(grepl("^291", dx1),1, ifelse(grepl("^292", dx1),1

ifelse(grepl("^3040", dx1),2,ifelse(grepl("^3047", dx1),2,

ifelse(grepl("^3051", dx1),3,ifelse(grepl("^98984", dx1),3,0))))))

Where I run into trouble is when I want to check for these select codes across each of the columns containing diagnostic codes. I attempted to write a function for this:

df$alldx<-apply(df[,c(1:3)],MARGIN = 2, function(dx){

ifelse(grepl("^291", dx),1, ifelse(grepl("^292", dx),1

ifelse(grepl("^3040", dx),2,ifelse(grepl("^3047", dx),2,

ifelse(grepl("^3051", dx),3,ifelse(grepl("^98984", dx),3,0))))))

})

The problem is I only want to count an individual once if they have one of the codes of interest; in the case of multiple code matches, then that persons code should be whichever diagnosis was given first. I feel like there must be a way to do this, but its well beyond my coding abilities!

网友答案:

Here's what I would do, using the data.table package for convenience:

library(data.table)
setDT(df)[, id := .I]

DF = melt(df, id="id")[, 
  `:=`(diag = substr(value, 1, 3), ref = substr(value, 4, 5))][order(id)]

So now the data looks like

    id variable value diag ref
 1:  1      dx1 64251  642  51
 2:  1      dx2 82381  823  81
 3:  1      dx3  8100  810   0
 4:  2      dx1  8052  805   2
 5:  2      dx2  8730  873   0
 6:  2      dx3 51881  518  81
 7:  3      dx1 64421  644  21
 8:  3      dx2   431  431    
 9:  3      dx3 81601  816  01
10:  4      dx1  3041  304   1
11:  4      dx2 29690  296  90
12:  4      dx3  9920  992   0
13:  5      dx1 72888  728  88
14:  5      dx2  8782  878   2
15:  5      dx3 59080  590  80
16:  6      dx1  7245  724   5
17:  6      dx2 60886  608  86
18:  6      dx3  8479  847   9
19:  7      dx1   291  291    
20:  7      dx2  4659  465   9
21:  7      dx3  4739  473   9
22:  8      dx1 30410  304  10
23:  8      dx2 30400  304  00
24:  8      dx3 95901  959  01
25:  9      dx1  2929  292   9
26:  9      dx2 30500  305  00
27:  9      dx3  8208  820   8
28: 10      dx1  7840  784   0
29: 10      dx2  6268  626   8
30: 10      dx3  8052  805   2
    id variable value diag ref

where id is the patient ID, diag is the diagnosis and ref is the refinement, if any. We can see that there is not a lot of duplication in diagnoses in the example data:

DF[, sum(duplicated(diag)), by=id]

    id V1
 1:  1  0
 2:  2  0
 3:  3  0
 4:  4  0
 5:  5  0
 6:  6  0
 7:  7  0
 8:  8  1
 9:  9  0
10: 10  0

Now, if we wanted to count how many patients have a diagnosis of 304, we can do:

DF[diag == "304", uniqueN(id)]

[1] 2

If you wish to map codes to new codes (which might be more confusing than anything else), I'd recommend doing so by making a separate table describing the mapping and merging on it to assign the new codes. I think data.table is also handy for that. The intro materials for the package are a good place to start with it.


A brief example for merging:

m = fread("
diag new_code
291 1
292 1 
304 2
305 3
989 3", colClasses=c(diag = "character"))

DF[m, on="diag", new_code := i.new_code ]
分享给朋友:
您可能感兴趣的文章:
随机阅读: