Rate this page del.icio.us  Digg slashdot StumbleUpon

Tips and Tricks: Splitting tar archives on the fly

by

Contributed by Alexander Todorov

Splitting big files into pieces is a common task. Another common task is to create a tar archive, and split it into smaller chunks that can be burned onto CD/DVD. The straightforward approach is to create the archive and then use ‘split.’ To do this, you will need more free space on your disk. In fact, you’ll need space twice the size of the created archive. To avoid this limitation, split the archive as it is being created.

To create a tar archive that splits itself on the fly use the following set of commands:

First create the archive:

tar -czf /dev/stdout $(DIRECTORY_OR_FILE_TO_COMPRESS) | split -d -b $(CHUNK_SIZE_IN_BYTES) - $(FILE_NAME_PREFIX)

To extract the contents:

cat $(FILE_NAME_PREFIX)* >> /dev/stdout | tar -xzf /dev/stdin

The above shown set of commands works on the fly. You don’t need additional free space for temporary files.

A few notes about this exercise:

  • ‘tar -L’ prompts you on every chunk created. Compression can not be used with -L option. The above command is not interactive and does not prompt for anything. Compression can be used.
  • The number of separate files is 100. This is because we use numerical suffixes – ‘split -d.’ If the specified chunk size is small you will get ‘split: Output file suffixes exhausted’ error. Try with bigger chunk size or with alphabetic suffixes.
  • ‘cat’ will concatenate the files properly if they are not renamed. This is due to the fact that the sort order is retained by the appended chunk suffixes.
  • Replace ‘tar -z’ with ‘tar -j’ for bzip2 compression or try your favourite compression program. Almost all ‘tar’ and ‘split’ options should be possible.
  • The resulting chunk files are not valid tar archives. They can not be extracted separately. If you want such functionality use ‘split-tar,’ which also needs more free space.

The information provided in this article is for your information only. The origin of this information may be internal or external to Red Hat. While Red Hat attempts to verify the validity of this information before it is posted, Red Hat makes no express or implied claims to its validity.

6 responses to “Tips and Tricks: Splitting tar archives on the fly”

  1. vsego says:

    Isn’t it easier to just omit the “f”?
    tar cz $(DIRECTORY_OR_FILE_TO_COMPRESS) | split -d -b $(CHUNK_SIZE_IN_BYTES) – $(FILE_NAME_PREFIX)
    cat $(FILE_NAME_PREFIX)* | tar xz

  2. Alexander Todorov says:

    vsego,
    You are right. Using /dev/stdin and /dev/stdout is to be more clear.

  3. Andreas says:

    Hi Alexander,

    I applied your tip, but the result is a mess up. I followed your command inputs:

    $ tar -czf /dev/stdout /home | split -d -b 4000m – backupPART

    It produced two files burnt onto 2 DVD+R. [Estimating the *real* size of a DVD+R media is another story (Different meaning of 'GB', different manufactures have different DVD+R media sizes...).]

    I did not verify the compression, the creation of the tarball nor the DVD+R burning. I was on the hurry and i didn’t know how. (Heavy human failure, I know.)

    Then i re-partioned my laptop harddisk, installed fedora 8 and run your tar extract command:

    $ cat backupPART* >> /dev/stdout | tar -xzf /dev/stdin

    I get this:

    gzip: stdin: not in gzip format
    tar: Child died with signal 13

    My tarball seams to be corrupt. Googling up and down the web I found ‘gzrecover’, ‘TarFixer’ and the ‘GNU tar manual’ by FSF.

    gzrecover crashed by memory access error/failure; TarFixer can’t be applied to compressed tarballs; but the manual make me learning ‘tar’.

    I asked for help at LinuxQuestions.org and fedoraforum.org and the other user assumed a mess up in the target file (backupPART) due to using the ‘/dev/stdout’ pseudo device. They said, any (error) message produced while running tar would re-direct to a split backupPART file. I didn’t believe that. In spite of being a real Linux newbie, I know that a process (can have/) has three standard I/O streams: stdin, stdout and stderr attached to user’s tty. A second Terminal running i.e. ‘ls -la’ should not interfere the first. And a program like ‘tar’ should stream error or diagnostic messages to stderr and not to stdout.

    Really? If you run ‘tar’ in verbose mode what happens? Is tar’s output sterr or stdout? I would guess stderr, because listing processed/tar’ed files is a kind of diagnostic message and not the main purpose (as ‘ls’).

    However, I am able to replicate a messy tarball using your tip. But it’s not exactly the command you presented. I added (accidentally) the ‘-v’ option to ‘tar’ and all the verbose output messages did go to stdout resp. split’s stdin.

    Do you know how can I recover my mixed up / corrupted tarball? I’m a Linux newbie und don’t know the tiny command* deleting tar’s verbose listing from by backupPART. But it should go like this:

    Read a (next) byte of tarball.mixed
    If byte.mixed is ‘/’ and next bytes are ‘home/..’
    then delete these bytes
    else go to begin.

    Thank you very much.

    Andreas

    *If You have any old RH Training textbooks you want to get rid off, send it to me. I’m eager for Linux knowledge:-)

  4. Alexander Todorov says:

    Andreas,
    as it seems -v option directs the output either to stdout or stderr.
    tar -czvf file.tar.gz [other args] : file list goes to stdout
    tar -czv [other args] : file list goes to stderr.

    First line directs tar to save the output to file.tar.gz so the listing goes to stdout.
    Note that the second line is not specifying -f and tar assumes /dev/stdout by default.

    The example I’ve given uses -f /dev/stdout to be more clear, as vsego noted above. Apparently that leads to problems. I believe tar is not checking if -f is “/dev/stdout”. That sounds like a bug(either in code or in documentation).

    Unfortunately I’m not aware of any way that you can recover your archive (split doesn’t reaaly have anything ot do here). I advise you to ask the question on a tar speciffic user/developers mailing list or forum.
    And you’re right, not paying attention or validating your archives is a big mistake (which you’ll not make anymore probably) especially when archiving your whole system.

  5. Klaus Lichtenwalder says:

    Just a few nitbits… If you want to use stdin/stdout with tar, it’s simply a -
    e.g.: tar cf – . | (cd /elsewhere; tar xf -)

    cat always appends its arguments to stdout, so
    cat $(prefix)* | command
    is sufficient. I don’t know and (honestly) don’t care if gnu-tar sends its output to stdout if no f argument given, every other unix uses the default tape device (which is /dev/rmt) if no f argument given (I have to work with Solaris and AIX too…).

  6. Ed Weinberg says:

    I use this to “ghost” partitions. My kids C: drive is /dev/hda2. I backup that “file” (really a partition) and when they break it I can easily restore it.