当前位置: 动力学知识库 > 问答 > 编程问答 >

linux - Select randomly a file from various directories and sort

问题描述:

I have a lot of text files spread out across several directories. I would like to sort all files and create a list of file names (in a text file) but in a peculiar and defined order. My initial thoughts are to randomly select the first file *1.txt from those directories. Then repeat the process (*2.txt, *3.txt, etc.) till all files names are in the list. How can I accomplish this in bash?

The basics:

Randomly selects file from 1 directory:

shuf -n1 –e *

Selects the first file from 1 directory:

ls | sort -n | head -1

EXAMPLE:

UPDATED: file strucutre / real file names format(this is just a few of the files, there are a few hundreds)

Initial order:

media/sf_linux_sandbox/papers/

|-- semester_1

| |-- cs630-linux_research_paper-fname_lname-001.txt

| |-- cs635-progamming_languages-fname_lname-002.txt

| |-- cs645-java_programming_paper-fname_lname-003.txt

| `-- cs900-computer_robotics_capstone-fname_lname-004.txt

|-- semester_2

| |-- cs650-software_methodologies-fname_lname-001.txt

| |-- cs675-nosql_db_research-fname_lname-002.txt

| |-- cs700-artificial_intelligence_reasearch-fname_lname-003.txt

| |-- cs800-algorithms_and_computational_complexity-fname_lname-004.txt

| |-- cs825-database_systems_internals-fname_lname-005.txt

| `-- cs850-computer_graphics-fname_lname-006.txt

|-- semester_3

|-- cs725-web_programming_technologies-fname_lname-001.txt

|-- cs750-data_programming-fname_lname-002.txt

`-- cs775-hardware_software_interface_paper-fname_lname-003.txt

The output/result I am looking to generate(Randomly shuffle the files but keep the numerical order):

results.txt

/filepath/cs650-software_methodologies-fname_lname-001.txt

/filepath/s630-linux_research_paper-fname_lname-001.txt

/filepath/cs725-web_programming_technologies-fname_lname-001.txt

/filepath/cs635-progamming_languages-fname_lname-002.txt

/filepath/cs750-data_programming-fname_lname-002.txt

/filepath/cs675-nosql_db_research-fname_lname-002.txt

/filepath/cs645-java_programming_paper-fname_lname-003.txt

/filepath/cs775-hardware_software_interface_paper-fname_lname-003.txt

/filepath/cs700-artificial_intelligence_reasearch-fname_lname-003.txt

/filepath/cs900-computer_robotics_capstone-fname_lname-004.txt

/filepath/cs800-algorithms_and_computational_complexity-fname_lname-004.txt

/filepath/cs825-database_systems_internals-fname_lname-005.txt

/filepath/cs850-computer_graphics-fname_lname-006.txt

网友答案:

This shuffles all the files in the source tree, partially sorts on the numeric part with a stable sort, so the other elements remain shuffled.

$ target=~/tmp/shuf
$ destination=/filepath/
$ tree $target
~/tmp/shuf
`-- papers
    |-- semester_1
    |   |-- cs630-linux_research_paper-fname_lname-001.txt
    |   |-- cs635-progamming_languages-fname_lname-002.txt
    |   |-- cs645-java_programming_paper-fname_lname-003.txt
    |   `-- cs900-computer_robotics_capstone-fname_lname-004.txt
    |-- semester_2
    |   |-- cs650-software_methodologies-fname_lname-001.txt
    |   |-- cs675-nosql_db_research-fname_lname-002.txt
    |   |-- cs700-artificial_intelligence_reasearch-fname_lname-003.txt
    |   |-- cs800-algorithms_and_computational_complexity-fname_lname-004.txt
    |   |-- cs825-database_systems_internals-fname_lname-005.txt
    |   `-- cs850-computer_graphics-fname_lname-006.txt
    `-- semester_3
        |-- cs725-web_programming_technologies-fname_lname-001.txt
        |-- cs750-data_programming-fname_lname-002.txt
        `-- cs775-hardware_software_interface_paper-fname_lname-003.txt

4 directories, 13 files
$ find $target -type f -iname "*.txt" \
   | shuf \
   | awk -F- '{printf("%s:%s\n", $0, $NF)}' \
   | sort -t : -k 2 -s \
   | cut -d : -f 1 \
   | xargs -n1 basename \
   | sed "s,^,$destination,"
/filepath/cs725-web_programming_technologies-fname_lname-001.txt
/filepath/cs650-software_methodologies-fname_lname-001.txt
/filepath/cs630-linux_research_paper-fname_lname-001.txt
/filepath/cs635-progamming_languages-fname_lname-002.txt
/filepath/cs750-data_programming-fname_lname-002.txt
/filepath/cs675-nosql_db_research-fname_lname-002.txt
/filepath/cs775-hardware_software_interface_paper-fname_lname-003.txt
/filepath/cs700-artificial_intelligence_reasearch-fname_lname-003.txt
/filepath/cs645-java_programming_paper-fname_lname-003.txt
/filepath/cs900-computer_robotics_capstone-fname_lname-004.txt
/filepath/cs800-algorithms_and_computational_complexity-fname_lname-004.txt
/filepath/cs825-database_systems_internals-fname_lname-005.txt
/filepath/cs850-computer_graphics-fname_lname-006.txt

To store the result in a file called filename, you can redirect:

$ find $target -type f -iname "*.txt" \
   | shuf \
   | awk -F- '{printf("%s:%s\n", $0, $NF)}' \
   | sort -t : -k 2 -s \
   | cut -d : -f 1 \
   | xargs -n1 basename \
   | sed "s,^,$destination," \
   > filename
网友答案:

I'm not sure I fully understand what you're asking. Are you trying to sort numerically based on trailing digits in the filename? If so, you will need to supply exact specifications for your filenames so that a proper regex can be used to extract the digit and sort from there... I.e. are your filenames always [a-z][1-9] or are there multiple characters, special characters, etc? If you are able to supply the real paths you are using as well as the exact expected output then it would probably make things much easier.

To answer the question 'select randomly a file from various directories'... Here are two very similar methods to display the path of one random file from each subdirectory of your current directory.

while IFS= read -r dir; do
    find "$dir" -maxdepth 1 -type f | shuf -n1
done < <(find -type d) > results.txt

Or...

shopt -s globstar
for dir in ./**/; do
    find "$dir" -maxdepth 1 -type f | shuf -n1
done > results.txt
shopt -u globstar

If you want the basename of each random file (rather than the full path), you can replace the inner find command with the following:

random="$(find "$dir" -maxdepth 1 -type f | shuf -n1)"
[[ -n $random ]] && echo "${random##*/}"

If you only want random txt files to be selected then just append the option -name '*.txt' to the end of the inner find command.

Note that I used the shuf command since you mentioned it in your question, but it probably could have been solved just as easily using $RANDOM.

网友答案:

I tried to match the output that you provided in your post at the time I composed the answer.

#!/bin/bash
usage_exit () {
    echo "usage: $0 <target-directory>"
    exit 0
}

if [ $# != 1 ] ; then
    usage_exit
fi

# The pattern below searches files in the range 000 through 199.
# You can change the pattern to match your needs.
for n in {0..1}{0..9}{0..9}
    do find $1 -type f -name '*'$n'.txt' | shuf
done
网友答案:

Another alternative, where you store the results of a find -type d in an array. Then find the largest number of regular files in any directory in your array, use that as the max bound in a ((i=1; i<=max; i++)) loop, shuffle the array in each body of the loop, and then traverse it copying the $i'th file in each directory if it exists, and nothing if it doesn't (i.e. if the directory has less than $i files).

#!/bin/bash

#shuffle function taken from http://mywiki.wooledge.org/BashFAQ/026
shuffle() {
   local i tmp size max rand

   # $RANDOM % (i+1) is biased because of the limited range of $RANDOM
   # Compensate by using a range which is a multiple of the array size.
   size=${#array[*]}
   max=$(( 32768 / size * size ))

   for ((i=size-1; i>0; i--)); do
      while (( (rand=$RANDOM) >= max )); do :; done
      rand=$(( rand % (i+1) ))
      tmp=${array[i]} array[i]=${array[rand]} array[rand]=$tmp
   done
}

destination=/filepath
max=0
shopt -s nullglob dotglob
while IFS= read -d $'\0' -r dir ; do
  array+=("$dir")
  count=$(ls -F "$dir" | egrep -v "^*[/*]$" | wc -l)
  ((count>max)) && max=$count
done < <(find . -mindepth 1 -type d -print0)

for ((i=1; i<=max; i++)); do
  shuffle
  for dir in "${array[@]}"; do
    file=$(find "$dir" -maxdepth 1 -type f -iname '*.txt' | sort -n | awk "NR==$i")
    [[ -n $file ]] && echo "$destination/$file" 
  done
done

Example

> tree
.
├── script
├── semester_1
│   ├── cs630-linux_research_paper-fname_lname-001.txt
│   ├── cs635-progamming_languages-fname_lname-002.txt
│   ├── cs645-java_programming_paper-fname_lname-003.txt
│   └── cs900-computer_robotics_capstone-fname_lname-004.txt
├── semester_2
│   ├── cs650-software_methodologies-fname_lname-001.txt
│   ├── cs675-nosql_db_research-fname_lname-002.txt
│   ├── cs700-artificial_intelligence_reasearch-fname_lname-003.txt
│   ├── cs800-algorithms_and_computational_complexity-fname_lname-004.txt
│   ├── cs825-database_systems_internals-fname_lname-005.txt
│   └── cs850-computer_graphics-fname_lname-006.txt
└── semester_3
    ├── cs725-web_programming_technologies-fname_lname-001.txt
    ├── cs750-data_programming-fname_lname-002.txt
    └── cs775-hardware_software_interface_paper-fname_lname-003.txt

3 directories, 14 files
> ./script
/filepath/cs725-web_programming_technologies-fname_lname-001.txt 
/filepath/cs650-software_methodologies-fname_lname-001.txt
/filepath/cs630-linux_research_paper-fname_lname-001.txt
/filepath/cs750-data_programming-fname_lname-002.txt
/filepath/cs635-progamming_languages-fname_lname-002.txt
/filepath/cs675-nosql_db_research-fname_lname-002.txt
/filepath/cs645-java_programming_paper-fname_lname-003.txt
/filepath/cs775-hardware_software_interface_paper-fname_lname-003.txt
/filepath/cs700-artificial_intelligence_reasearch-fname_lname-003.txt
/filepath/cs900-computer_robotics_capstone-fname_lname-004.txt
/filepath/cs800-algorithms_and_computational_complexity-fname_lname-004.txt
/filepath/cs825-database_systems_internals-fname_lname-005.txt
/filepath/cs850-computer_graphics-fname_lname-006.txt
网友答案:

I think this may be useful:

dir='some/directory'
file=`/bin/ls -1 "$dir" | sort --random-sort | head -1`
path=`readlink --canonicalize "$dir/$file"` # Converts to full path
echo "The randomly-selected file is: $path"

Consider have a look at the following question.

Hope it helps.

Clemencio Morales Lucas.

分享给朋友:
您可能感兴趣的文章:
随机阅读: