data structures - How do I handle multiple kinds of missingness in R? -


There are codes for various types of missingness in many surveys. For example, a codebook might indicate:

0- 99 data

-1 no question asked

-5 not know

-7 was refused to answer

-9 module is not asked

Stata has a beautiful feature to deal with many types of missingness, in this you assign a normal to missing data, but More special The missing type (.a, .b, .c, ..., .j) are also all orders that specify the missing reports on all the missing entries for the answer, but are specified, but you have different types The missing ones can also recover later. This is especially useful when you feel that refusing to answer does not have different implications for indecent tactics, do not ask questions.

I have never been in such a facility in R, but I would really like to do this. Are there any ways to mark different types of NAs? I can imagine making more data (the type of negligence, or more compact index of missing types of rows), that is, the length of the length (my.data.frame) vector, but it seems very unwieldy.

I know who you are looking for, and it is not applicable in R. I have no knowledge of that package, where it has been implemented, but it is not very difficult to code yourself.

Adding data using code is a practical way to double the entire dataframe and save space, I will add index to that dataframe rather than rebuilding a whole data frame.

Example:

  NACode & lt; - Function (x, code) {DF < - sapply (x, function (i) {i [% code in i%] & lt; - n i i}) id & lt; - (is.na (df)) rowid & lt; - ID %% enid colid & lt; - ID% /% nrow (x) + 1 NAdf & lt; - data.frame (id, rowid, colid, value = as.matrix (x) [id]) df & lt; - as.data.frame (Df) attr (Df, "NAcode") & lt; - NAdf Df}   

This allows to:

  & gt; DF & lt; - data.frame (a = 1: 10, b = c (1: 5, -1, -2, -3, 9, 10)) gt; Code & lt; - List ("missing" = - 1, "no answer" = - 2, "no address" = 3)> DftNA & LT; - nxod (df, code) & gt; Str (DfwithNA) 'data.frame': 10 Obs 2 variables: $ A: Number 1 2 3 4 5 6 7 8 9 10 $ B: Number 1 2 3 4 5 NA NA 9 9 - At (*, "NAcode ") = 'Data.frame': 3 obs. Of 4 variables: .. $ id: int 16 17 18 .. $ rowid: int 6 7 8 .. $ colid: num 2 2 2 .. $ value: num -1 -2 -3   

function can be adjusted to add an extra attribute that gives you labels for different values, also see You can transcribe back:

  ChangeNAToCode & lt; - Function (x, code) {naval & lt; - for the attr (x, "NAcode") (NAval $ value% code in%) x [navel $ line [i], naval $ call [i]]   

This only allows you to change the code you want, if it is necessary, when no argument is given then return the code to the function Can be customized for Similar functions can be created to remove data based on code, I think you can understand yourself.

But in one line: Using attributes and index can be a good way to do this.

Comments