当前位置: 动力学知识库 > 问答 > 编程问答 >

How to express membership in multiple categories in R?

问题描述:

How does one express a linear model where observations can belong to multiple categories and the number of categories is large?

For example, using time dummies as the categories, here is a problem that is easy to set up since the number of categories (time periods) is small and known:

tmp <- "day 1, day 2

0,1

1,0

1,1"

periods <- read.csv(text = tmp)

y <- rnorm(3)

print(lm(y ~ day.1 + day.2 + 0, data=periods))

Now suppose that instead of two days there were 100. Would I need to create a formula like the following?

y ~ day.1 + day.2 + ... + day.100 + 0

Presumably such a formula would have to be created programmatically. This seems inelegant and un-R-like.

What is the right R way to tackle this? For example, aside from the formula problem, is there a better way to create the dummies than creating a matrix of 1s and 0s (as I did above)? For the sake of concreteness, say that the actual data consists (for each observation) of a start and end date (so that tmp would contain a 1 in each column between start and end).


Update:

Based on the answer of @jlhoward, here is a larger example:

num.observations <- 1000

# Manually create 100 columns of dummies called x1, ..., x100

periods <- data.frame(1*matrix(runif(num.observations*100) > 0.5, nrow = num.observations))

y <- rnorm(num.observations)

print(summary(lm(y ~ ., data = periods)))

It illustrates the manual creation of a data frame of dummies (1s and 0s). I would be interested in learning whether there is a more R-like way of dealing with these "multiple dummies per observation" issue.

网友答案:

You can use the . notation to include all variables other than the response in a formula, and -1 to remove the intercept. Also, put everything in your data frame; don't make y a separate vector.

set.seed(1)    # for reproducibility
df  <- data.frame(y=rnorm(3),read.csv(text=tmp))
fit.1 <- lm(y ~ day.1 + day.2 + 0, df)
fit.2 <- lm(y ~ -1 + ., df)
identical(coef(fit.1),coef(fit.2))
# [1] TRUE
分享给朋友:
您可能感兴趣的文章:
随机阅读: