Looking for writing-related posts? Check out my new writing blog, www.larrykollar.com!

Wednesday, October 21, 2020

Adventures of a #techcomm geek: Constants Aren't, Variables Won't

DITA-OT logo
One of the advantages of having a DITA-based workflow for technical writing is for translation. During the acquisition binge that ended with us being on the “bought” end, we picked up a product with a fairly strong retail presence. You’ve probably seen those products in Best Buy and similar places, and maybe even bought one to upgrade your home network. (No, I’m not going into details, because I don’t write documentation for that line… mostly.)

But, as usual, I digress. Retail products, or not-retail products that are supplied to the end-user, need to have localized documentation—that is, not just in the native language, but using country-specific idioms (although this might go a little too far). And, to help with consistency, things like notes or cautions use canned strings.

The DITA Open Toolkit (DITA-OT) PDF plugin provides a pretty good list of canned “variable” strings for a bunch of different languages, including languages with non-Latin glyphs. Of course, we added to that list… somewhat. I put quotes on “variables” because I don't know why they call them variables; they are basically language-specific constants. Local Idiom, I suppose.

Fast-forward a couple years, to the disease-ridden hellscape that most refer to as “2020.” A year ago, one of the point people for translations sat two aisles down from me, on those days we weren’t both working from home. We would have hashed half of this out in person, before roping in a bunch of other people in a long email chain. (Don't get me wrong, working remote is da bomb, and I hope they don’t expect me to do time in the office in the future… but it had the occasional upside.)

Anyway, this was the first Brazilian Portuguese translation we had done in a while, and weird things were happening. My initial guess—that we had provided updated strings for only a subset of languages (mostly French and German)—turned out to be correct, when I started poking around in the source. I remembered working on a script to parse the XML-based “variable” files to build a spreadsheet, so we could easily see what needed updating. Turns out, I had either given up or got pulled away after the script was less than a quarter-baked (let alone half). I beamed my brain power at the cursed XSLT file, and it finally turned brown and gave me the output I wanted: name[tab]value.

Now I was halfway there. I had tab-delimited files for each language, now I just needed to coalesce them into a single (again, tab-delimited) file. As I’m fond of saying, when I want to process a big wad of text, awk is how I hammer my nails… and I started pounding.

Since I had an anchor point—the “variable” names that were constant for each language—it was a Small Matter of Programming. Knowing that English (en) was the most complete language helped; I used it as a touchpoint for all the other languages. After a few fits and starts, the script produced the output I needed and I imported it into Excel. Blank cells that needed values, I highlighted in dark red. Things I needed to personally tweak here and there got yellow highlighting. I hid rows that didn’t need attention (some were complete across the board, others we don’t use), and sent it to the rest of the team.

Just to be complete, I finished the day embedding the XSLT and awk scripts inside a shell script (and tested the results). If I need to do this again, and I probably will, I can do it in a matter of minutes instead of spending an entire day on it.

I deliberately formatted the spreadsheet so I can export changes to TSV (tab-separated values) and write another script to rebuild the language “variables” if I feel it’s necessary. It’s always good to anticipate future requests and be ready for them.

No comments

Post a Comment

Comments are welcome, and they don't have to be complimentary. I delete spam on sight, but that's pretty much it for moderation. Long off-topic rants or unconstructive flamage are also candidates for deletion but I haven’t seen any of that so far.

I have comment moderation on for posts over a week old, but that’s so I’ll see them.

Include your Twitter handle if you want a shout-out.

LinkWithin

Related Posts Plugin for WordPress, Blogger...