May 27, 2013

Unix Interview Questions: splitting and archiving files

Unix for software developers

1.Top 17 Unix commands Java developers use frequently 2.History commands 3.Shell scripting 4. Reading from a file
5.Purging older files 6.Splitting and archiving files 7.Emulator, SSH client 8.Unix commands for developers 9.Unix basic interview Q&A

Q. Can you write a Unix script that archives files that are older tahn 7 days from a folder say /data/csv? The number of files in the folder /data/csv needs to be split into a group of 10 files. For example, if you had 25 *.csv files under the folder /data/csv, 3 tar files containing 10, 10, and 5 will be created.

A. Firstly, define a configuration file that contains the source dir, archive dir, how many days old, and split size. For example

zip.cfg file

/cygdrive/c/data/csv         /cygdrive/c/data/csv/zip     +7    10

Now, the shell script file that reads the zip.cfg and archives the files.


TODAY=`date '+%Y%m%d'`

#routine to split files in batch of say 10 and archive them
splitandzip () {
      rm $SPLITPREFIX*
      echo "cd $1; ls  | split -l $2 - $SPLITPREFIX; counter=1"
      cd $1; ls  | split -l $2 - $SPLITPREFIX; counter=1
      ls -l $SPLITPREFIX*
      cat $SPLITPREFIX*
      # This loop returns all files in the 'targetdir' directory; sed removes any leading ./ sequences
      for FILENAME in $(/usr/bin/find $SPLITPREFIX* | sed 's/^\.\///'); do
        cat  $FILENAME | xargs echo
           cat $FILENAME  | xargs tar -cvf $1$TODAY'_'$counter'.tar'
     cat $FILENAME | xargs rm -f

echo "Reading config file..... "$CFG


     mkdir $ARCHIVEDIR
     echo  " housekeeping....." $SOURCEDIR $ARCHIVEDIR $WAITTIME $SPLITSIZE
     cd $SOURCEDIR
  #move the files that are older than x days into the archive dire
  find  .  -name . -o -type f  -prune -type f -mtime $WAITTIME -exec mv {} $ARCHIVEDIR \;  ;cd $SCRIPTDIR
     splitandzip $ARCHIVEDIR $SPLITSIZE
done < $CFG

Finally, you run the above script as shown below.

sh hk_zip.cfg

The commands used above are

Split command

ls  | split -l 2 - split_prefix

-l: number of files
 -: input file is from the standard input
The last argument is the split file name.

The "ls" list the file names and the split command takes 2 file names at a time and creates files with names like split_prefixaa, split_prefixab, split_prefixac, etc and these files contains maximum 2 names from the ls. For example

File split_prefixab contents


Find command

find  .  -name . -o -type f  -prune -type f -mtime +2 -exec mv {} /out/archive \;

-name : anything
-type: file
-o: Boolean OR
-mtime : older than 2 days
-exec: execute mv command
{} : selected files to the archive folder /out/archive

Xargs Command

ls  | xargs tar -cvf test.tar'

'ls' (i.e. list) the file names in the current directory, and xargs loops through each listed file and adds it to the tar file test.tar that it creates with the -c option.


In the above example, it has been used as shown below for each split file. The sed command is used to substitute ./ with /

     for FILENAME in $(/usr/bin/find $SPLITPREFIX* | sed 's/^\.\///'); do
        cat  $FILENAME | xargs echo
           cat $FILENAME  | xargs tar -cvf $1$TODAY'_'$counter'.tar'
     cat $FILENAME | xargs rm -f


`pwd` means, execute the pwd command to list the present working directory and store the value into a variable name SCRIPTDIR.

The following code reads each line from the zip.cfg file, which has 4 fields separated by spaces.

 do somrthing here

  done < zip.cfg

Note: If you are running on Window, you can practice the above code by downloading the MobaXterm, which is a free Unix emulator for Windows. You need to download the files MobaXterm_Personal_5.0.exe, MobaXterm.ini, and for the korn shell download the plugin Pdksh.mxt3. Put all this files under a same folder and create a short-cut for MobaXterm_Personal_5.0.exe to start the MobaXterm window.



Post a Comment

Subscribe to Post Comments [Atom]

Links to this post:

Create a Link

<< Home