I’ve had this particular line in my Tines to-do list for a while. As part of our transition to a new documentation system, I and another writer handled the conversions. We had a high-end tool to help us out, although creating rules was a dicey proposition and the vendor ended up helping (we made tweaks where they could make an obvious difference, though).
In the most recent round, we got to the FrameMaker-based docs. Frame (as its users often nickname it) is unique in that it allows overlaying callouts and other graphic elements on top of images. This is a huge help for translating manuals, because the writers don’t have to maintain a separate set of graphics for each language. Anyway, since the new system isn’t FrameMaker, something else had to happen. The conversion system could be configured to either flatten the images (convert to a PNG, rasterizing the callouts) or create an SVG (Structured Vector Graphics). We chose the latter, thinking that since SVG is an XML format, the new system could maintain them easily.
We were wrong.
Long story shortened considerably, we eventually threw up our hands and decided to convert all the SVGs to “flattened” PNG files. The writers would keep the SVG files on their hard drives to make changes, then upload a new flattened PNG when needed. I wrote a script to do the deed; it crunched through hundreds of SVGs at about one per second, and updated all the links in the book to point to the new PNGs.
All well and good, until one of the writers went to publish. “The images look blurry,” she told me. Taking a look, she was obviously right. It took me about three seconds to figure out why.
You see, our SVG files have a width attribute, which was set to the width in the original FrameMaker files (a typical width is 576 pixels, which at 96dpi is 6 inches even). All well and good, but the original images run about 1200 pixels wide—so in essence, we were throwing away over ¾ of the image data when doing the conversion. No wonder it looked blurry! But we were all weary of messing with it by that point; I had written scripts that:
- extracted embedded images from an SVG, converted them to PNG, then changed the link so the SVG referred to the file instead
- went the other way, embedding images in an SVG
- converted the entire mess to PNG in one swell fwoop
The documentation work that was my primary job function had been back-burner’ed for too long. I added an “investigate this further” item to my backlog list and got back to the bread-and-butter part of my job.
This week, I all but cleared a fairly long to-do list in three days, so I thought maybe I could give this thing another shot. A quick Google turned up some promising code on superuser.com; I divided the image width by the scaled-down width in one SVG, applied the script, and got a nice sharp image! The only problem with that is, it would take about 10 minutes to do each file by hand, and there are hundreds. A script is the only practical way to blast through all of them.
When I tackle a situation like this, I tend to use a shell script to drive awk, Perl, and XSLT scripts. Each has its strengths, and trying to force (say) XSLT to work some of awk or Perl’s string-processing magic is more trouble than it’s worth. And vice versa. So… XSLT to extract the file name and (scaled) width, awk to parse the output of file (a utility that returns the dimensions of an image file) and do the calculations, all wrapped up in a shell script to conduct the Geek Orchestra.
Of course, I ran out of time this afternoon to put the whole thing together, but I have all the sub-script logic down. I just need to score the symphony. That will likely take me to noon tomorrow, then I’ll be back to bugging people already bogged down with too much stuff to lend me their expertise.
I also achieved Inbox Zero at work today… and that’s a rant for another time.