当前位置: 动力学知识库 > 问答 > 编程问答 >

bash - Linux: Using split on limited space

问题描述:

I have a huge file on a linux machine. The file is ~20GB and the space on my box is ~25GB. I want to split the file into ~100mb parts. I know theres a 'split' command but that keeps the original file. I don't have enough space to keep the original. Any ideas on how this can be acomplished? I'll even work with any node modules if they make the task easier than bash.

网友答案:

My attempt:

#! /bin/bash

if [ $# -gt 2 -o $# -lt 1 -o ! -f "$1" ]; then
    echo "Usage: ${0##*/} <filename> [<split size in M>]" >&2
    exit 1 
fi

bsize=${2:-100}
bucket=$( echo $bsize '* 1024 * 1024' | bc )
size=$( stat -c '%s' "$1" )
chunks=$( echo $size / $bucket | bc )
rest=$( echo $size % $bucket | bc )
[ $rest -ne 0 ] && let chunks++

while [ $chunks -gt 0 ]; do
    let chunks--
    fn=$( printf '%s_%03d.%s' "${1%.*}" $chunks "${1##*.}" )
    skip=$(( bsize * chunks ))
    dd if="$1" of="$fn" bs=1M skip=${skip} || exit 1 
    truncate -c -s ${skip}M "$1" || exit 1 
done

The above assumes bash(1), and Linux implementations of stat(1), dd(1), and truncate(1). It should be pretty much as fast as it gets, since it uses dd(1) to copy chunks of the initial file. It also uses bc(1) to make sure arithmetic operations in the 20GB range don't overflow anything. However, the script was only tested on smaller files, so double check it before running it against your data.

网友答案:

You can use tail and truncate in a shell script to split a file in place, while destroying the original file. We are splitting the file in place backwards so that we can use the truncate. Here is a sample Bash script:

#!/bin/bash

if [ -z "$2" ]; then
   echo "Usage: insplit.sh <splitsize> <filename>"
   exit 1
fi

FILE="$2"
SPLITSIZE="$1"

FILESIZE=`stat -c '%s' $FILE`
BLOCKCOUNT=$(( (FILESIZE+SPLITSIZE-1)/SPLITSIZE ))
echo "Split count: $BLOCKCOUNT"

BLOCKCOUNT=$(($BLOCKCOUNT-1))
while [ $BLOCKCOUNT -ge 0 ]; do
  FNAME="$FILE.$BLOCKCOUNT"
  echo "writing $FNAME"
  OFFSET=$((BLOCKCOUNT * SPLITSIZE))
  BLOCKSIZE=$(( $FILESIZE - $OFFSET))
  tail -c "$BLOCKSIZE" $FILE > $FNAME
  truncate -s $OFFSET $FILE
  FILESIZE=$((FILESIZE-BLOCKSIZE))
  BLOCKCOUNT=$(( $BLOCKCOUNT-1 ))
done

I confirmed the results with a random file:

$ dd if=/dev/urandom of=largefile bs=512 count=1000
$ md5sum largefile
7ff913b62ef572265661a85f06417746  largefile
$ ./insplit.sh 200000 largefile
Split count: 3
writing largefile.2
writing largefile.1
writing largefile.0
$ cat largefile.0 largefile.1 largefile.2 | md5sum
7ff913b62ef572265661a85f06417746  -
分享给朋友:
您可能感兴趣的文章:
随机阅读: