当前位置: 动力学知识库 > 问答 > 编程问答 >

regex - Finding multiple files from different folders using regular expressions

问题描述:

I'm trying to load multiple .txt files in R, from different folders.

I have problems writing the path and pattern using regular expressions.

My path has this structure:

'/Users/folderA/folderB/folderC/folderD/01_01_2012/folderE/file.txt'

So, the path is almost the same, except that the folder with the date name always changes.

I have tried to load it like this:

filesToProcess <- list.files(path = "/Users/folderA/folderB/folderC/folderD/",

pattern = "*_*_*/folderE/*.txt")

But this doesn't seem to work.

Could someone please help me writing down this with regular expressions?

Thanks a lot!

网友答案:

The key here is to use argument recursive=TRUE so that you can search inside the folders that are in the original directory:

filesToProcess <- list.files(path = "/Users/folderA/folderB/folderC/folderD", 
                      pattern = "txt", recursive = TRUE, full.names = TRUE)

The pattern has to correspond to the name of the files, it can't refer to the name of the folders (see ?list.files). That's why you need a second step where you have to narrow down to the specific folders you wanted. Note the use of argument full.names=TRUEin the previous call that allow us to keep the path of each file (NB: you also have to drop the final / of the path argument or else it ends up doubled in our output and leads to an error when you'll try to upload the files).

filesToProcess[grep("folderE", filesToProcess)]

A final note:
Your regular expression was flawed anyway: * means

The preceding item will be matched zero or more times.

What you wanted was .: see ?regexp

The period . matches any single character.

网友答案:

Although the subject refers to regular expressions it seems from the example that you really want to use globs. In that case try:

Sys.glob("/Users/folderA/folderB/folderC/folderD/*_*_*/folderE/*.txt")
分享给朋友:
您可能感兴趣的文章:
随机阅读: