当前位置: 动力学知识库 > 问答 > 编程问答 >

How to reorganize data from a list by rownames into new dataframes? R

问题描述:

I have multiple xts objects stored in a list each with 1000+ rows. They represent stock rolling window regression data. Each element has it's unique Ticker name. Here they are called Stock1, 2 ...etc for testing purposes. Rows are named by date as is the xts format. Each element is equal in dimensions. Each one looks like this:

 > tail(testlist$Stock1, n = 3)

(Intercept) rmrf smb hml rmw cma

2014-12-29 0.0003223177 1.010215 -0.02164844 -0.3322500 0.07819563 1.106934

2014-12-30 0.0002631315 1.002356 -0.02351438 -0.3465390 0.05954400 1.118506

2014-12-31 0.0002837304 1.000084 -0.01619536 -0.3494401 0.06121434 1.124845

> tail(testlist$Stock2, n = 3)

(Intercept) rmrf smb hml rmw cma

2014-12-29 0.0003308951 0.7503819 -0.1967255 -0.10242616 -0.2264914 0.8329570

2014-12-30 0.0003051495 0.7409709 -0.1899856 -0.07461764 -0.2240448 0.7921883

2014-12-31 0.0002614874 0.7478099 -0.1833077 -0.06197362 -0.2056615 0.7550211

> tail(testlist$Stock3, n = 3)

(Intercept) rmrf smb hml rmw cma

2014-12-29 -0.0003803988 0.8363603 -0.4153470 0.7459769 -0.7981382 -0.2839360

2014-12-30 -0.0004121386 0.8352243 -0.4224404 0.7405976 -0.8114066 -0.2790438

2014-12-31 -0.0004660716 0.8355641 -0.4343012 0.7571033 -0.8057412 -0.3026019

> tail(testlist$Stock4, n = 3)

(Intercept) rmrf smb hml rmw cma

2014-12-29 -0.0008295692 0.9296299 -0.07776571 0.007084297 -0.1377356 0.8038542

2014-12-30 -0.0007734696 0.9383387 -0.08941983 0.011685507 -0.1092656 0.7863335

2014-12-31 -0.0007591168 0.9391670 -0.08782070 0.015619229 -0.1083707 0.7924232

What i need to do:

Merge the rows by name and by aggregating all data in my list to obtain a new set of data. Each should look like this:

 Name Date (Intercept) rmrf smb hml rmw cma

Stock1 2014-12-29 0.0003223177 1.010215 -0.02164844 -0.3322500 0.07819563 1.106934

Stock2 2014-12-29 0.0003308951 0.7503819 -0.1967255 -0.10242616 -0.2264914 0.8329570

Stock3 2014-12-29 -0.0003803988 0.8363603 -0.4153470 0.7459769 -0.7981382 -0.2839360

Stock4 2014-12-29 -0.0008295692 0.9296299 -0.07776571 0.007084297 -0.1377356 0.8038542

Each such element should not be a time-series any more. but a static one, with each stock representing it's coeffiecient values at time "t". In terms of size each element should have a number of rows equal to the number of Stocks in the original list.

EDIT as asked by josilber

 > dput(list(Stock1=tail(testlist$Stock1, n = 3), Stock2=tail(testlist$Stock2, n = 3)))

structure(list(Stock1 = structure(c(0.000322317700198485, 0.000263131488679374,

0.000283730373928844, 1.01021497011709, 1.00235580055438, 1.00008407331697,

-0.0216484434660844, -0.023514378867335, -0.0161953614672028,

-0.332250031553704, -0.346538978804535, -0.349440052163927, 0.078195628743663,

0.0595439997647003, 0.0612143446991752, 1.1069343396633, 1.11850626745067,

1.12484530131584), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1419811200,

1419897600, 1419984000), tzone = "UTC", tclass = "Date"), .Dim = c(3L,

6L), .Dimnames = list(NULL, c("(Intercept)", "rmrf", "smb", "hml",

"rmw", "cma"))), Stock2 = structure(c(0.000330895099805035, 0.000305149500450527,

0.000261487411574969, 0.750381906747217, 0.740970893865186, 0.747809929767095,

-0.1967254672836, -0.189985607343021, -0.183307667378927, -0.10242615734439,

-0.0746176364711423, -0.0619736225998069, -0.226491384004977,

-0.224044849587752, -0.205661480898329, 0.832956994676299, 0.792188348360969,

0.755021100668421), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index = structure(c(1419811200,

1419897600, 1419984000), tzone = "UTC", tclass = "Date"), .Dim = c(3L,

6L), .Dimnames = list(NULL, c("(Intercept)", "rmrf", "smb", "hml",

"rmw", "cma")))), .Names = c("Stock1", "Stock2"))

I am completely in the dark. I have looked at some functions that may come to use: lapply / also the merge function seems to be suitable but it only works on 2 elements.

I will continue to update this post as i search for answers. If anyone has any leads or has done this before and can point in the right direction, thank you!

EDIT

#Flatten data add one more name and put into one data frame

all_coef_data<- do.call(rbind,Map(cbind,

Name=names(testlist),

Date=lapply(testlist,function(x) as.Date(as.POSIXct(c(attr(x,'index')),origin='1970-01-01'))),

lapply(testlist, as.data.frame)

))

A common denominator in each row that i need to get out is the Date. I now split the dataframe by Date using this. The output is a list.

out <- split(all_coef_data , f = all_coef_data$Date )

output:

> head(out$'2011-05-23', n=3)

Name Date (Intercept) rmrf smb hml rmw cma

Stock1.2011-05-23 Stock1 2011-05-23 -4.376389e-04 1.103582 -0.21747611 -0.1879211 -0.05849794 -0.1949192

Stock2.2011-05-23 Stock2 2011-05-23 1.115140e-04 1.198622 0.05422819 0.9998529 0.92141407 -0.8565260

Stock3.2011-05-23 Stock3 2011-05-23 5.457214e-05 1.303025 0.04705294 0.6897673 -0.19708983 -0.8247877

> tail(out$'2011-05-23', n=3)

Name Date (Intercept) rmrf smb hml rmw cma

Stock48.2011-05-23 Stock48 2011-05-23 0.0007354997 0.505054 0.1774544 -0.38934089 0.71775909 0.5189329

Stock49.2011-05-23 Stock49 2011-05-23 0.0004224351 1.304719 0.4511903 -0.64937062 -0.08872941 0.1545058

Stock50.2011-05-23 Stock50 2011-05-23 0.0003851261 1.020434 -0.1107910 -0.03964192 0.09526658 -0.4961902

网友答案:

Sounds like you want to

  1. flatten the xts objects to data.frames using as.data.frame(),
  2. cbind() new columns Name (from list component names) and Date (from xts row names, which actually come from the index attribute), and finally
  3. rbind() everything together into a single data.frame.

do.call(rbind,Map(cbind,
    Name=names(testlist),
    Date=lapply(testlist,function(x) as.Date(as.POSIXct(c(attr(x,'index')),origin='1970-01-01'))),
    lapply(testlist,as.data.frame)
));
##                     Name       Date  (Intercept)      rmrf         smb         hml         rmw       cma
## Stock1.2014-12-29 Stock1 2014-12-29 0.0003223177 1.0102150 -0.02164844 -0.33225003  0.07819563 1.1069343
## Stock1.2014-12-30 Stock1 2014-12-30 0.0002631315 1.0023558 -0.02351438 -0.34653898  0.05954400 1.1185063
## Stock1.2014-12-31 Stock1 2014-12-31 0.0002837304 1.0000841 -0.01619536 -0.34944005  0.06121434 1.1248453
## Stock2.2014-12-29 Stock2 2014-12-29 0.0003308951 0.7503819 -0.19672547 -0.10242616 -0.22649138 0.8329570
## Stock2.2014-12-30 Stock2 2014-12-30 0.0003051495 0.7409709 -0.18998561 -0.07461764 -0.22404485 0.7921883
## Stock2.2014-12-31 Stock2 2014-12-31 0.0002614874 0.7478099 -0.18330767 -0.06197362 -0.20566148 0.7550211

If you don't like the new row names, you can wrap the line in `rownames<-`(...,NULL).

网友答案:

Here is another solution.

library(xts)
library(magrittr)
library(dplyr)

stock1 = 
  data.frame(a = c(1, 2), b = c(1, 2) ) %>%
  xts(order.by = 
        c("2014-12-29", "2014-12-30") %>%
        as.Date)

stock2 = 
  data.frame(a = c(1, 2), b = c(1, 2) ) %>%
  xts(order.by = 
        c("2014-12-29", "2014-12-30") %>%
        as.Date)

test = list(stock1 = stock1, stock2 = stock2)

result = 
  1:length(test) %>%
  lapply(function(i)
    test[[i]] %>%
      as.data.frame %>%
      mutate(name = names(test)[i],
             date = rownames(.) %>% as.Date ) ) %>%
  bind_rows 
分享给朋友:
您可能感兴趣的文章:
随机阅读: