Unix Interview Questions: splitting and archiving files
Unix for software developers
Q. Can you write a Unix script that archives files that are older tahn 7 days from a folder say /data/csv? The number of files in the folder /data/csv needs to be split into a group of 10 files. For example, if you had 25 *.csv files under the folder /data/csv, 3 tar files containing 10, 10, and 5 will be created.
A. Firstly, define a configuration file that contains the source dir, archive dir, how many days old, and split size. For example
zip.cfg file
/cygdrive/c/data/csv /cygdrive/c/data/csv/zip +7 10
Now, the shell script zip.sh file that reads the zip.cfg and archives the files.
#!/bin/ksh TODAY=`date '+%Y%m%d'` SPLITPREFIX="/cygdrive/c/temp/abcdef_split" #routine to split files in batch of say 10 and archive them splitandzip () { rm $SPLITPREFIX* echo "cd $1; ls | split -l $2 - $SPLITPREFIX; counter=1" cd $1; ls | split -l $2 - $SPLITPREFIX; counter=1 ls -l $SPLITPREFIX* cat $SPLITPREFIX* # This loop returns all files in the 'targetdir' directory; sed removes any leading ./ sequences for FILENAME in $(/usr/bin/find $SPLITPREFIX* | sed 's/^\.\///'); do cat $FILENAME | xargs echo cat $FILENAME | xargs tar -cvf $1$TODAY'_'$counter'.tar' cat $FILENAME | xargs rm -f ((counter=counter+1)) done } CFG=$1 echo "Reading config file..... "$CFG SCRIPTDIR=`pwd` while read SOURCEDIR TARGETDIR WAITTIME SPLITSIZE do ARCHIVEDIR="$TARGETDIR$TODAY/" mkdir $ARCHIVEDIR echo " housekeeping....." $SOURCEDIR $ARCHIVEDIR $WAITTIME $SPLITSIZE cd $SOURCEDIR #move the files that are older than x days into the archive dire find . -name . -o -type f -prune -type f -mtime $WAITTIME -exec mv {} $ARCHIVEDIR \; ;cd $SCRIPTDIR splitandzip $ARCHIVEDIR $SPLITSIZE done < $CFG
Finally, you run the above script as shown below.
sh logs_zip.sh hk_zip.cfg
The commands used above are
Split command
ls | split -l 2 - split_prefix
-l: number of files
-: input file is from the standard input
The last argument is the split file name.
The "ls" list the file names and the split command takes 2 file names at a time and creates files with names like split_prefixaa, split_prefixab, split_prefixac, etc and these files contains maximum 2 names from the ls. For example
File split_prefixab contents
my_file1.csv myfile2.csv
Find command
find . -name . -o -type f -prune -type f -mtime +2 -exec mv {} /out/archive \;
-name : anything
-type: file
-o: Boolean OR
-mtime : older than 2 days
-exec: execute mv command
{} : selected files to the archive folder /out/archive
Xargs Command
ls | xargs tar -cvf test.tar'
'ls' (i.e. list) the file names in the current directory, and xargs loops through each listed file and adds it to the tar file test.tar that it creates with the -c option.
More...
In the above example, it has been used as shown below for each split file. The sed command is used to substitute ./ with /
for FILENAME in $(/usr/bin/find $SPLITPREFIX* | sed 's/^\.\///'); do cat $FILENAME | xargs echo cat $FILENAME | xargs tar -cvf $1$TODAY'_'$counter'.tar' cat $FILENAME | xargs rm -f ((counter=counter+1)) done
SCRIPTDIR=`pwd`
`pwd` means, execute the pwd command to list the present working directory and store the value into a variable name SCRIPTDIR.
The following code reads each line from the zip.cfg file, which has 4 fields separated by spaces.
while read SOURCEDIR TARGETDIR WAITTIME SPLITSIZE do #....do somrthing here done < zip.cfg
Note: If you are running on Window, you can practice the above code by downloading the MobaXterm, which is a free Unix emulator for Windows. You need to download the files MobaXterm_Personal_5.0.exe, MobaXterm.ini, and for the korn shell download the plugin Pdksh.mxt3. Put all this files under a same folder and create a short-cut for MobaXterm_Personal_5.0.exe to start the MobaXterm window.
Labels: UNIX
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home