当前位置: 动力学知识库 > 问答 > 编程问答 >

r - Creating data frame with dates and data from a text files

问题描述:

I want to create a data frame from a files. First I have downloaded the ftp files in to my pc from internet. Files can be found in this link: data from ftp. I want to create a data frame with date (taken from the Launch date line of each file) on one column and data on other columns as well.

For grabbing all files I used the following code:

`setwd("C:/Users/")

path = "~C:/Users/"

files <- lapply(list.files(pattern = '\\.l100'), readLines)

test.sample<-do.call(rbind, lapply(files, function(lines){

# for each file, return a data.frame of the datetime, pulled with regex

data.frame(datetime = lubridate::dmy(sub('^.*Launch Date* : ', '', lines[6])),

# and the data, read in as text

read.table(text = lines[13:length(lines)]))

}))`

I am wondering what is wrong with the above code. If you can write a new code would be great too. Thank you in advance.

The data looks like this:

National Oceanic and Atmospheric Administration, U.S. Department of Commerce

Station : Pago Pago, American Samoa

Station Height : 5 meters

Latitude : -14.33

Longitude : -170.71

Flight Number : ASA016

Launch Date : 18 July 1986

Launch Time : 02:31:00 GMT

Radiosonde Type:

Radiosonde Num :

O3 Sonde ID : 4A1834 Background: 0.050 microamps (0.20 mPa) Flowrate: 29.90 sec/100ml RH Corr: 2.00 %

Sonde Total O3 : 268 (51) DU Sonde Total O3 (SBUV): 251 (35) DU

Level Press Alt Pottp Temp FtempV Hum Ozone Ozone Ozone Ptemp O3 # DN O3 Res

Num hPa km K C C % mPa ppmv atmcm C 10^11/cc DU

0 1007.7 0.005 304.0 31.5 26.0 73 1.63 0.016 0.0000 38.4 3.883 268

1 9999.9 0.100 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999

2 982.0 0.200 301.2 26.5 21.4 74 1.69 0.017 0.0003 38.4 4.096 267

3 9999.9 0.300 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999

4 960.0 0.400 301.2 24.6 20.6 79 1.64 0.017 0.0006 38.4 3.983 267

5 9999.9 0.500 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999

6 946.0 0.600 301.1 23.2 20.2 83 1.65 0.017 0.0008 38.4 4.021 267

7 932.0 0.700 300.9 21.8 19.9 89 1.65 0.018 0.0010 38.5 4.040 267

8 920.9 0.800 301.1 21.0 16.4 75 1.65 0.018 0.0012 38.5 4.051 266

9 9999.9 0.900 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999

10 9999.9 1.000 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999

11 893.0 1.100 302.0 19.2 12.1 64 1.58 0.018 0.0016 38.5 3.923 266

12 9999.9 1.200 9999.9 999.9 999.9 999 99.90 99.999 99.9990 999.9 999.999 9999

I have many files like this. Thank you again.

网友答案:

Though I like the metadata and somewhat self-documentation present in this fixed-width table file, it can be interesting to import them programmatically. Excel happens to do alright since it doesn't complain about lines that don't match the pattern, but that doesn't help import numerous files.

So we need something a little home-baked. I only downloaded two of the files, but this should work equally well with many more. (I'm assuming the format is consistent, otherwise we'll need to fix some of the magic constants such as 15 (rows to ignore) and 29 (rows that include the table headers).)

(files <- list.files(pattern = "asa*"))
# [1] "asa001_1986_04_01_20.l100" "asa002_1986_04_14_18.l100"

This is the workhorse. Instead of "just" taking the date as you requested, it's about the same effort to grab all the metadata headers. They made it a little problematic by putting multiples on some lines (and some labels with a space), but I think I found a suitable pattern to find them.

alldfs <- lapply(files, function(f) {
  # read in all but the data; two MAGIC CONSTANTS
  txt <- readLines(f, n = 29)[-(1:15)]
  i <- which(! nzchar(trimws(txt)))
  hdrs <- sapply(strsplit(txt[1:(i-1)], ": "), trimws)
  hdrsdf <- as.data.frame(Reduce(c, lapply(hdrs, function(hdr) {
    z <- strsplit(hdr, "  ")
    if (length(z) == 1) {
      setNames(NA_character_, nm = z[[1]])
    } else {
      setNames(lapply(z[-1], head, n=1), nm = trimws(sapply(z[-length(z)], tail, n=1)))
    }
  })), stringsAsFactors = FALSE)
  df <- as.data.frame(lapply(read.table(f, skip=27, stringsAsFactors=F)[-1,],
                             as.numeric))
  cbind.data.frame(hdrsdf, df)
})
length(alldfs)
# [1] 2

The result is a wide data.frame. The first 13 columns are constant, the metadata headers. If you only want Launch.Date, feel free to drop the rest.

str(alldfs[[1]])
# 'data.frame': 361 obs. of  27 variables:
#  $ Station              : chr  "Pago Pago, American Samoa" "Pago Pago, American Samoa" "Pago Pago, American Samoa" "Pago Pago, American Samoa" ...
#  $ Station.Height       : chr  "5 meters" "5 meters" "5 meters" "5 meters" ...
#  $ Latitude             : chr  "-14.33" "-14.33" "-14.33" "-14.33" ...
#  $ Longitude            : chr  "-170.71" "-170.71" "-170.71" "-170.71" ...
#  $ Flight.Number        : chr  "ASA001" "ASA001" "ASA001" "ASA001" ...
#  $ Launch.Date          : chr  "01 April 1986" "01 April 1986" "01 April 1986" "01 April 1986" ...
#  $ Launch.Time          : chr  "20:40:00 GMT" "20:40:00 GMT" "20:40:00 GMT" "20:40:00 GMT" ...
#  $ Radiosonde.Type      : chr  NA NA NA NA ...
#  $ Radiosonde.Num       : chr  NA NA NA NA ...
#  $ O3.Sonde.ID          : chr  "4A1911" "4A1911" "4A1911" "4A1911" ...
#  $ Background           : chr  "0.050 microamps (0.19 mPa)" "0.050 microamps (0.19 mPa)" "0.050 microamps (0.19 mPa)" "0.050 microamps (0.19 mPa)" ...
#  $ Flowrate             : chr  "29.10 sec/100ml" "29.10 sec/100ml" "29.10 sec/100ml" "29.10 sec/100ml" ...
#  $ RH.Corr              : chr  "2.00 %" "2.00 %" "2.00 %" "2.00 %" ...
#  $ Sonde.Total.O3       : chr  "242" "242" "242" "242" ...
#  $ Sonde.Total.O3..SBUV.: chr  "233 (36) DU" "233 (36) DU" "233 (36) DU" "233 (36) DU" ...
#  $ Level                : num  1010 10000 10000 10000 971 ...
#  $ Press                : num  0.005 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ...
#  $ Alt                  : num  302 10000 10000 10000 302 ...
#  $ Pottp                : num  30.2 999.9 999.9 999.9 25.8 ...
#  $ Temp                 : num  24.6 999.9 999.9 999.9 20.7 ...
#  $ FtempV               : num  72 999 999 999 74 999 999 999 75 999 ...
#  $ Hum                  : num  1.65 99.9 99.9 99.9 1.68 99.9 99.9 99.9 1.48 99.9 ...
#  $ Ozone                : num  0.016 99.999 99.999 99.999 0.017 ...
#  $ Ozone.1              : num  0e+00 1e+02 1e+02 1e+02 5e-04 ...
#  $ Ozone.2              : num  38.1 999.9 999.9 999.9 38.1 ...
#  $ Ptemp                : num  3.93 1000 1000 1000 4.07 ...
#  $ O3                   : num  242 9999 9999 9999 241 ...
分享给朋友:
您可能感兴趣的文章:
随机阅读: