Using Sed to Process a Set of Files

In Creating a File Set Processing Script with Vim, Vim was used to quickly create a script for processing a set of files. Using Vim to generate the script works well if the operation only needs to done once. However, if you have to perform the whole operation later on another set of file, you would have to repeat the whole process. If you have Sed installed, the process may be simplified so that you can do it all on the command line.

The difference between Sed and Vim is that Sed is a stream editor and is completely run on the commandline. There is no visual editor with Sed. The stream could be from a pipe, standard input or a file. If you would like an introduction to Sed, you can read An introduction to sed or Sed – An Introduction and Tutorial.

Sed has a substitute command that is very similar to the one found in Vim, and again, this is used to perform the file processing. According to the documentation of Sed’s version of the command, its syntax is:

s/regexp/replacement/flags

Where the s denotes the substitute command, regexp is the pattern to search for, replacement what the matched content will be replaced with and flags are some options that can be turned on or off. The regular expression supported by Sed is also very similar to the one available in Vim. You can find more information about the regular expression in its Sed’s regular expressions documentation

In Creating a File Set Processing Script with Vim, the contents of the directory was put into Vim using r! and ls (or dir in DOS, note that unlike the previous post, I’ll be writing this one with mainly Linux commands instead of DOS) and the following substitute command was used to convert the list into a script:

%s/\(\(.*\)\.tmp\)/mv \1 \2.dat/

So, in order to generate the same rename commands with Sed, we need to pipe the output of ls to Sed that will execute a similar substitute command to the above. The shell command for this on one line (note that I have split the command over multiple lines to make it fit better, it should be able to work if you copy and paste it into a Linux shell on a system with Sed installed):

ls *.tmp \
    | sed 's/\(\(.*\)\.tmp\)/mv \1 \2/'

For example, in a directory with the same files listed in the following ls output:

kah@linux-6s6e:~/temp> ls
first.tmp  second.tmp  third.tmp

Executing the command produces the following output:

kah@linux-6s6e:~/temp> ls *.tmp \
    | sed 's/\(\(.*\)\.tmp\)/mv \1 \2.dat/'
mv first.tmp first.dat
mv second.tmp second.dat
mv third.tmp third.dat

Now that we have seen that the above produces the right commands, we can now execute the generated commands. You could redirect the output to a file by adding “>[script file]” to the end of the command, similar to what is done in the following:

kah@linux-6s6e:~/temp> ls *.tmp \
    | sed 's/\(\(.*\)\.tmp\)/mv \1 \2.dat/' \
    > rename
kah@linux-6s6e:~/temp> ./rename
kah@linux-6s6e:~/temp> ls
first.dat  rename  second.dat  third.dat

Alternatively, in Linux, you can pipe the output from Sed directly to another shell. To use this method, the command looks something like this:

ls *.tmp \
    | sed 's/\(\(.*\)\.tmp\)/mv \1 \2.dat/' \
    | sh

This way you do not have to create another script file to execute. The following terminal output shows this method in action:

first.tmp  second.tmp  third.tmp
kah@linux-6s6e:~/temp> ls *.tmp \
    | sed 's/\(\(.*\)\.tmp\)/mv \1 \2.dat/' \
    | sh
kah@linux-6s6e:~/temp> ls
first.dat  second.dat  third.dat

If you are going to pipe to the shell, you should check the output first from Sed first (by not piping at all). This way, you can get an idea of the commands that will be executed in the shell.

Finally, if the command is used repeatedly, you could consider writing a shell script for it. For the example that has been used on this post, the following Bash script will change files with a particular extension to another extension:

#!/bin/sh

if [ $# = 2 ]
then
    ls -1 *.$1 \
        | sed ' s/\(\(.*\)\.'$1'\)/mv \1 \2.'$2'/' \
        | sh
else
    echo Usage: $0 [current extension] [new extension]
fi

The following terminal output shows the script in action:

kah@linux-6s6e:~/temp> ls
first.tmp  second.tmp  third.tmp
kah@linux-6s6e:~/temp> ../changeext.sh tmp dat
kah@linux-6s6e:~/temp> ls
first.dat  second.dat  third.dat
kah@linux-6s6e:~/temp> ../changeext.sh dat lst
kah@linux-6s6e:~/temp> ls
first.lst  second.lst  third.lst
kah@linux-6s6e:~/temp> ../changeext.sh lst txt
kah@linux-6s6e:~/temp> ls
first.txt  second.txt  third.txt

Notice that in the above, I invoke the script with “../changeext.sh“, since the script was in the parent directory of the directory that I was in at the time and that I named the script file “changeext.sh“. In case you decide to copy and use the script, this will, obviously, have to change according to the directory were you put the script (unless the directory is in your environment’s PATH) and the file name you save it as. Also, notice from the output, the script is able to change any extension to any other extension – much simpler than having to rewrite the rename script in Vim or the entire Sed command!

Although I have stuck with batch renaming for this post, it is possible to use this technique to perform other tasks with a set of files. Examples of other uses may include signing a series of certificate requests (with OpenSSL to generate certificates or processing a batch of images with ImageMagick. The main change that you have to make to do these other things is in the replacement value specified in the substitute command.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: