A friend of mine came to me with a problem the other day. He had a html document that displayed lots of pictures that all had empty alt
attributes. The alt
attribute is used to display a text when you hover over an image on a website. What he wanted was to substitute all of these with alt
tags that had incrementing numbers starting from 001 and going to 002, 003 and so on. I figured awk
would be the best way to achieve this considering I needed to increment a value.
Instead of using the actual file he supplied me, lets create a quick mockup similar to the one I did the initial work on
<!doctype html>
<html>
<head>
<title></title>
</head>
<body>
<ul class="gallery">
<li><a class="colorbox" href="images/test213.jpg"><img src="images/thumbnails/test213.jpg" alt=""></a></li>
<li><a class="colorbox" href="images/test158.jpg"><img src="images/thumbnails/test158.jpg" alt=""></a></li>
<li><a class="colorbox" href="images/test2.jpg"><img src="images/thumbnails/test2.jpg" alt=""></a></li>
<li><a class="colorbox" href="images/test6.jpg"><img src="images/thumbnails/test6.jpg" alt=""></a></li>
<li><a class="colorbox" href="images/test90.jpg"><img src="images/thumbnails/test90.jpg" alt=""></a></li>
</ul>
</body>
</html>
The filenames were of no use. If they followed a logical numbering I could probably just have used them as a reference for the alt
tags. Instead I needed to create a number that I would increment each time I came across one of these lines.
The one thing they all have in common that I couldn't find elsewhere was class="colorbox"
. With this piece of information we can make sure that we only match lines which contain the word colorbox. The proper AWK-syntax to match these lines and perform some action is simply /colorbox/ { ... }
What I ended up with was the following
#!/usr/bin/awk -f
/colorbox/ {
i = sprintf("%03d", ++i)
sub("alt=\"\"", "alt=\""i"\"")
}; 1
- match the text "colorbox" and open a block
- increment variable
i
and format it as a 3 digit number (the first time the value will be 001) alt=""
will be substituted foralt="[value of i]"
- increment variable
- end block and add
;
to separate the commands awk
will process one line at a time and1
is a shortcut for{print}
which will print the current line. Since we are outside of the block this will print every line, including the ones where substitution was performed
Let's run the script and output to a new file
% awk -f script.awk gallery.html > gallery2.html
And the result
<!doctype html>
<html>
<head>
<title></title>
</head>
<body>
<ul class="gallery">
<li><a class="colorbox" href="images/test213.jpg"><img src="images/thumbnails/test213.jpg" alt="001"></a></li>
<li><a class="colorbox" href="images/test158.jpg"><img src="images/thumbnails/test158.jpg" alt="002"></a></li>
<li><a class="colorbox" href="images/test2.jpg"><img src="images/thumbnails/test2.jpg" alt="003"></a></li>
<li><a class="colorbox" href="images/test6.jpg"><img src="images/thumbnails/test6.jpg" alt="004"></a></li>
<li><a class="colorbox" href="images/test90.jpg"><img src="images/thumbnails/test90.jpg" alt="005"></a></li>
</ul>
</body>
</html>