I'm a little confused by what the following in= step does.
Here is the code:
merge data2 data3 (in=inb);
I would really appreciate if someone can tell me what the in=inb here does.
DS_A DS_B ID VAR1 ID VAR2 A X A X B X B X C X D X data want; merge ds_a ds_b; by id; run;
will produce this
WANT: ID VAR1 VAR2 A X X B X X C X D X
If you add the IN= option you add a temporary and hidden variable that is 1 when the observation is present in that dataset, 0 otherwise, like this:
DS_A DS_B ID VAR1 ID VAR2 A X A X B X B X C X D X data want; merge ds_a (in=frs) ds_b (in=scn); by id; run; WANT: ID VAR1 VAR2 FRS SCN A X X 1 1 B X X 1 1 C X 1 0 D X 0 1
So you can play with this hidden variable to keep observations from one dataset or from both or from only one etc...
if frs; ---> keep ID=A B C if scn; ---> keep ID=A B D if frs and scn ---> keep ID=A B if frs and not scn --> keep ID=C etc..
One other aspect of the behaviour of the
in= option that I don't think anyone else has mentioned - if you merge two different datasets using the same
in= variable for both, and a row is in one but not the other, a value of 1 takes precedence over a value of 0. E.g.
data test; merge sashelp.class(where = (sex = 'F') in = a) sashelp.class(where = (sex = 'M') in = a); by name; put _all_; run;
In this case, a = 1 for every row, even though each row is only present in one of the input datasets.
SYNTAX section of the
merge Statement documentation, the data sets you are merging can have options. In this case you are using IN= Data Set Option. Below is the explanation of this option:
Creates a Boolean variable that indicates whether the data set contributed data to the current observation.
So in this case, you are naming this boolean variable
Thanks all for the help. But what does that do in my particular case?
let's assume the syntax is now something like the following:
data data1; merge data2 data3 (in=inb); by ID; if inb; drop Var1 Var2; run;
because the option (in=inb) is after data3, it is referred to this dataset. hence, you will have a boolean variable that will be 1 in data1 (final dataset) if that observation was present in data3, 0 otherwise.
Data2 Data3 ID ID A A B B C D
You will have
Data3 ID INB A 1 B 1 C 0 D 1
Adding the statement if INB; you will keep only observations with INB=1 (observations coming from data3)
Data3 ID A B D
merge data2 data3 (in=inb); by ID; if inb;
is the same as a right join in SQL.
Technically, "inb" is a 0/1 flag set to "1" for each record found in data3. "if inb" is shorthand for "if inb is true [then keep the record]", and for numeric fields "true" means greater than zero.