Looking for writing-related posts? Check out my new writing blog, www.larrykollar.com!
Showing posts with label technology. Show all posts
Showing posts with label technology. Show all posts

Wednesday, November 02, 2022 1 comment

Adventures of a #techcomm geek: Still I look to find a reason

A future project—I plan to start on it in earnest early next year—is requiring a specific set of “reset reason codes.” The codes, and their meanings, are described in MULPI (a standard specification in the industry I work in)… at least, so said the ticket the developers are using to track this piece of the melange that makes up modern communication devices. I had a PDF copy of MULPI on my work system already, and once I realized the spec said “initialization reason” where my co-workers used the more direct “reset reason,” I found the table in section C.1.3.6. Hey, it’s only 887 pages.

Needle in a haystack

Besides it being a table—I hate tables, in general—I found two glaring issues:

  • I wanted the number in the left column, not the right
  • The table did not describe the conditions that would cause the reset
The latter was a matter of searching the PDF on initialization reason, but first I wanted to reverse the columns in that table. Copying the table out of the PDF, and pasting into a terminal window, gave me one line per cell:
Initialization Reason
Initialization Code
POWER-ON
1
T17_LOST-SYNC
2
ALL_US_FAILED
3
BAD_DHCP_ACK
4
etc
Once again, I had a textual nail, and I reached for my hammer: awk. Redirecting the lines into a junk file (literally called junk), I decided to create tab-delimited output with the column order reversed:
awk '{getline num; print num "\t" $0; next}' junk
Mirabile dictu, it worked the first time! The getline function does a sort of look-ahead, grabbing the next line of input before the normal awk loop can get to it. So it’s pretty easy to grab two lines, print them in reverse order, and move on to the next pair.

“Oh,” I then thought, “I should have made it a Markdown table.” Rather than start over, I just piped the output of the first awk script into another one:
awk '{getline num; print num "\t" $0; next}' junk | \
awk 'BEGIN {FS="\t"} NR==2 {print "|-----|-----|"} \
   {print "|", $1, "|", $2, "|"}'
Once again, first time was the charm! This “long pipeline of short awk scripts” approach does make debugging easier, especially if you don’t have to do any debugging. If you’re not familiar with awk, let me pretty up that second script to make it easier to follow:
BEGIN {
    FS="\t"
}

NR==2 {
    print "|-----|-----|"
}

{
    print "|", $1, "|", $2, "|"
}
The BEGIN block gets executed before the script reads its first line of input. In this case, it sets the field separator (the character or regular expression that breaks a line into fields) to the tab character.

The block beginning with NR==2 applies only to the second line of input (NR = “record number” or line number). In this case, it prints a Markdown separator (between table heading and body) before processing the second line… remember, awk takes each pattern/action pair in order. Since there is no next statement, it falls through to the default action.

The last block is the default action, since it has no pattern to trigger its use. It puts Markdown table call separators on each end of the line, and between the two fields. The commas insert the output field separator (a space, by default). I could get the same result with:
print "| " $1 " | " $2 " |"
So I copied the Markdown-formatted table from the terminal window and pasted it into a text editor. From there, adding a third column was easy. Not so easy, or at least somewhat more tedious, was searching the PDF for “initialization reason” and adding the conditions triggering each reason code to the table. In some cases, there are multiple issues for a particular reason. In two cases, there was nothing in the spec at all about the reason code. Fortunately, I was familiar with one of the causes and the other was straightforward.

The Markdown has been converted to DITA, and is now waiting for the project to get started in earnest. And it won’t be bugging me to deal with it over the next couple months.

Friday, December 17, 2021 No comments

If the Kludge works, use it

Kludge. Jury-rig (or the variant, jerry-rig). Lash-up. Sunshine Engineering (named for Mr. Sunshine, who bodged lots of things together that I had to straighten out later). Whatever you call it, including the racist ones nobody should have used in the first place, it’s (so the dictionary says) “an ill-assorted collection of parts assembled to fulfill a particular purpose.” Sometimes, the kludge is a necessity; a critical piece of equipment failed, deadlines are looming, and there’s no way to run to a nearby store to find what you need.

So… last week, someone called the wife. Her video business, that hasn't had any significant income in two or three years, still has a listing the in the Yellow Pages. “Can you put VHS video onto DVDs?” As she has done that before, she took the gig. Last time, we hooked a VCR into her commercial-grade DV deck and rolled tape. This time… not so much. The deck wouldn’t stay powered up, and wouldn’t open the tray (she was just passing through, but wanted the tape in there to be out so it wouldn’t interfere). She opined the deck got fried by one of the close lightning strikes we’ve had from time to time, and I couldn’t dismiss the possibility. Meanwhile, I was trying to find our cache of RCA-to-BNC adapters. We found plenty of the BNC-in/RCA-out types, but we needed the opposite. Mal-Wart dot commie carries them, but “not sold in stores.” The Mighty Zon could get us some by Thursday… it was Tuesday, and the wife was in DO SOMETHING NOW mode because she wanted to wrap this up by Friday. But with the DV deck apparently fried, there wasn’t any urgency to get the connectors anyway.

Seems to work fine for old VHS tapes
But! As the wife snarled something about taking her client’s tapes back to her, I remembered finding something else when looking for the adapters: my old Canon ZX-80 camcorder. When I say old, I mean I took footage of The Boy’s high school football games, so at least 15 years. Probably more like 17. The imager crapped out on it a long time ago, but I kept it around because I could at least play tapes into a Firewire connection, view on a tiny screen, or a bigger screen if I plugged in the included DV cable. In other words, it could do everything but take video itself. Being a tech writer, I kept the manual with the camcorder. There were lots of functions I never explored, and just in case…

Well, just in case arrived Tuesday night. I went down the table of contents, and found the vindication of my packrat ways on page 84: Converting Analog Signals to Digital Signals (Analog/Digital Converter). In this mode, you:

  • connect a VCR to the camcorder’s DV cable
  • connect a Firewire cable to the Mac
  • start capture on the Mac, play the tape, and relax
Sure, it’s a kludge, but time was tight. We have better camcorders, but I didn’t have time to find the manuals to figure out how to use them as video digitizers. Here, we hit Snag #2 (or is it #3? I lost count quickly): Final Cut Pro wasn’t capturing the video, or even seeing it. Looking at the documentation, I guessed FCP was too snooty to deal with an early-century consumer camcorder. So I tried iMovie, and iMovie told me:

(link to original)

So I guess iMovie is friendlier to the older, not so well-heeled, camcorders of the world. Since FCP has no problem importing iMovie assets, we were on the way.

Or so I thought. Snag (int(rand()*4))+3 came in this afternoon. Wife was again ranting about giving up and taking the tapes back to the client, because she couldn’t burn a DVD. I seem to remember us using Compressor to create MPEG-2 video, then using Toast to burn that, but we couldn’t remember the incantations and she wasn’t inclined to take the extra step. So I started troubleshooting. The DVD Burner app icon started bouncing, so I clicked it. “Couldn’t burn (click here for more information).” Clicking the helpful link told me what the initial “insert dual-layer disc” message should have told me in the first place: the video was too long to fit.

Solution: cut the video to 80 minutes so it fits on a DVD. Splitting video and moving it around is one of FCP’s strong suits, so the wife got to work on it.

As I type (10:30pm Friday evening), she has all the video on the system, and two or three DVDs burned. She’s behind schedule, but has a clear path to completion.

If the kludge works, use it… at least until you get a more elegant solution in hand. We’ll look into replacing the DV deck—looks like we might find something that works for around $400—and if she’s going to start back on her video work, it will pay for itself soon enough. Then, maybe, we can look at modernizing the intake end of things (i.e., the commercial-grade camcorders that are at least as old as that ZX-80). As I understand it, a lot of the newest models skip the tape drive and go directly to a SD card with some enormous amount of capacity. If that’s true, importing would mean sticking the SD card into the back of the iMac and copying the file. At that point, who needs a DV deck?

Tuesday, December 07, 2021 No comments

Computer-Aided Weeding

A couple weeks ago, I finally decided to start pulling in all the notes I’d saved up from Evernote and Google Keep into Logseq. I started with Evernote, just because.

First, I had to update the Evernote app on my iMac, so I could actually access my stuff. That should tell you how long it’s been since I actively used it.

After exporting, I used a utility called Yarle to convert the notes in each notebook to Markdown.

Now the hard part: deciding what I wanted to keep, and what to toss. The even harder part: cleaning up the sloppy mess that were most of those individual pages. There were over 400. Cleaning them up in Logseq was do-able, but slow. Lots of repeated stuff. This wasn’t a job for an outliner, it was a job for a high-powered text editor like Vim or Atom.

Unlike Vim, Atom sports a sidebar that displays all the files in the directory, and its regular expression parser recognizes newlines. So I could find blank strings using the expression ^- *\n (which means, “look for a line starting with a dash, followed by zero or more spaces, then a new line”) and get rid of them.

But the even bigger time-saver: realizing a lot of those entries were long outdated (some dated back to 2013) and deleting them. By the time I was done with that pass, I had 109 “keepers” left. From there, it was a matter of applying search and replace to fix common issues.

So with 3/4 of the pages deleted, and much of the boilerplate stuff from the remaining pages deleted as well (I just need the content, the source, and some info about the author). That means my assets folder has 4852 items in it, and most of them were no longer being linked to.

Now… am I going to make 4852 passes through my pages, by hand, to see if a pic can be deleted?

The shell (aka Terminal) is my machine gun for blasting a job like this.

# assume we're in the assets directory

mkdir -p ../assets_removed

for i in *; do

  grep -q "$i" ../pages/* || mv "$i" ../assets_removed

done

Let’s pick this apart, for those who need it.

The first line is just a comment. An important one, all the same. You need to be in your Logseq database’s assets directory for this to work correctly. BAD THINGS will happen otherwise! One of the nice things about using MacOS: if I eff something up, I can pull it out of the Time Machine backup and try again.

Next, we make a directory called assets_removed at the same level as the assets directory. Just in case we make a mistake, you know. The -p option is there to make the script shrug and move on if the directory already exists (if we’ve been here before, for example).

The third and fifth lines begin and end a loop, going through each of those >4800 graphic files.

Inside the loop, we search for the file name in the pages. The -q option is exactly what you want for a script like this; it returns success if grep finds the string and failure otherwise. The || (two vertical bars) means “execute the next part if it fails” (in this case, fails to find the file name)… and the next part moves the unused file to the assets_removed directory.

And I ended up with 255 files (out of nearly 5000) that were actually being used. The other ones are out of the way, and can be safely deleted once I verify that none of them are needed.

[UPDATE: After stepping through the pages again, I found 18 “false negatives” that had to be dragged back into the assets folder. That’s why you move them out of the way, instead of just nuking them.]

It took about a minute to grind through the assets directory, and a couple of minutes to set up the script, but that beats the heck out of hours (or days) doing it by hand! I’m fond of saying, I’m lazy enough to get the computer to do my work for me. It doesn’t always pay off this big, but it does pay off.

Off to get the Google Keep notes…

Monday, June 21, 2021 2 comments

The next generation of information management

Some 15 years ago, I was enjoying Journler. The thing I really liked about it back then was, it could post an entry straight to Blogger. Of course, Google likes to screw around with Blogger. They broke the posting mechanism that Journler used, and now they’re concentrating on keeping it deliberately broken for Safari users. Over the years, Journler slowly sunk into the morass of apps that got left behind by advances in MacOS.  I continued to use it to capture flash fiction, scenes, and chunks of longer stories, until it became unusable. The source code for Journler has been available on Github for a long time, but I just now found out about it.

But I digress. As Journler wheezed and died, I tried a variety of paper and app systems to capture stuff I wanted to come back to later. Evernote was okay, until they crippled the free version to support only two devices (previously five). Google Keep held my interest for a long time; but I’m trying to extricate myself as much as possible from Google these days—and if I could find something better that isn’t WordPress, I’d go through the hassle of moving everything. Perhaps my longest-running attempt has been using Tines to keep notes and to-do lists organized. And yet, I’m always keeping my nose in the air, sniffing for a better way of doing things.

Recently, I started looking at journaling apps. Day One came highly recommended, and had the huge advantage of both Mac and iOS apps that talked to each other. I gave it as honest a chance as I could—downloading both the MacOS and iOS apps—and even with the daily prompt on my phone, I never really warmed up to it.

Someone suggested Logseq last week, and it sounded interesting enough to give it a try. The developers describe it as “a privacy-first, open source knowledge base,” and the videos they link to from the home page (an enthusiastic user who describes how to make the most of it) convinced me to give it a try. It can run either as a webapp, writing to your hard drive, or as a standalone desktop app. The developer says he was heavily influenced by Roam Research, Org Mode (Logseq supports Org Mode, although it defaults to Markdown), Tiddlywiki, and Workflowy.

The interface looks invitingly plain, at first. You’re presented with a journal page with today’s date on it, but otherwise blank. Start typing, and it supports Markdown (a big plus)… oh, wait. It’s also an outliner (and given my long-term relationship with Tines, that’s another big plus). Oh, type two square brackets and enter a title, like a Wiki link, and you get a new page that you can click to enter (yeah, I’ve always been fond of Wikis). Oh, type /TODO Download the desktop app on a line, and you get a to-do entry. Put hashtags on entries you think you’ll need to come back to later.

Clusters show groups of related entries
This seems like a chaotic way of doing things… until you display the graph. Then, Logseq gathers up all those scattered to-do entries, those hashtagged items, and other things, and pulls them all together. Some call this a digital garden or digital knowledge garden—me, I just call it software magic. Of course, you get out what you put in. You can create custom queries to pull things together based on how you want them. If you leave it running overnight, click the little “paw print” icon at the top left to open the new day’s journal page. Maybe this is what makes Logseq so much more approachable for me.

It took me a day or two to realize that this is the most natural approach for working with Logseq. There’s a lot of layers to it, and this brief post isn’t doing it justice. I’m using it at home with the desktop app, and at work using the webapp (because Doze complained about how not safe the desktop app was). Either approach works fine.

There’s a mobile app called Obsidian that can be set up to work with Logseq’s files, but it’s a private beta right now and I don’t need it just yet.

Now I have to figure out how to pour all the different entries in all the different paper and pixel systems I’ve accumulated over the years into Logseq. Someone wrote a script to convert Google Keep to Markdown, so that’s settled. I hope I can write a script to pull all my old Journler entries in.

Thursday, April 09, 2020 2 comments

Life and Work in the Time of Pandemic (part 3, school)

We’re on spring break this week… like I said in the last post, we were supposed to be at the beach, but having to cancel a vacation falls into the #firstworldproblems bucket.

The two-week “online learning” got extended to next week… then just before break, they finally realized the wisest course was to finish out the school year online. It’s a pain in the rear, but better that than getting a bunch of people sick without need (or the resources to take care of them).

I need to say, the school system obviously meant the whole online learning program to be something used once or twice over the winter, maybe for a few days. Now they’re having to adapt it for a months-long outage. My biggest beef with it is that they couldn’t settle on a single app or website to manage everything—there are three or four apps/sites, and they occasionally roll out another one. Although my Mac has a built-in password manager, I’ve gotten account fatigue over the years. So every time I get a memo about Yet Another Account to set up, they can hear my eyes rolling all the way out here.

“Out here” presents its own online learning issues. This is farm country, and I still think it’s amazing we have DSL. It usually works OK, unless heavy storms take out a line card (which happens pretty often)… or everybody who can work at home is doing just that and their kids are also online trying to do their schoolwork. It gets where the connection can’t even support a low-bandwidth music stream. I can do the normal work things—email, edit a DITA file from the cloud, chat—because that traffic is mostly short bursts and can slide in between the school traffic. Conferences are more iffy, but I usually use my cellphone for audio and the video is showing mostly static images of spreadsheets or documents.

But I digress. Our school is swimming against the Zoom current (an app I only heard about after the isolation began) and using Google Meet (aka Hangout). We can often manage one hangout at a time, or at least phone in if bandwidth is an issue. Charlie’s therapists (and pre-school) is also using Meet. Most of the time, this seems to work out. The bandwidth is hitting Daughter Dearest even harder, because she’s a teacher and has to be online. It got so bad that she simply bypassed the wheezing DSL and used her phone to get out. Needless to say, that burned through our data cap, our reserve, and then some. Now we don’t have a cushion for this month. I suggested she go to Dunkin', get a coffee and maybe a doughnut from the drive-thru, then sit in her car and scarf the Wi-Fi from there.

Often enough, our connection is marginally good enough, so DD and her kids have been at the manor most school days. That means I’ve had AJ (or Charlie) in my lap more than once during a conference call. I can totally derail a meeting by turning on my camera with either one; they’re both cute.

While we’re on break, I’m trying to set up a place in the larger upstairs bedroom (The Boy’s old room) as an office space. We made some headway yesterday. But I can’t help but think that once I get upstairs, all the kids will wander (or be sent) upstairs so I can still deal with something. I guess that’s okay, as long as I hold my end up at work. So far so good!

How about you? Have you torn a bumper sticker off your van yet? Comments are open!

Tuesday, March 10, 2020 No comments

Adventures of a #techcomm Geek: Match Game, 2020

It’s been a while since I did one of these, and this one goes in deep.

We’ve been using DITA at work for a year or two now, but rarely is there time to go back and take advantage of the things it offers, retrofitting those things into the documentation we brought in. (Docs we’ve created since then seem to get more thorough treatment.)

One of those things is reuse. It’s easy to reuse an entire topic in a different book—even if it was duplicated. “Hey,” says a writer, “that’s the same thing. Let’s throw away topic B and use topic A.”

DITA also supports reusing common paragraphs in two or two dozen topics, but that’s a little harder. First, you have to recognize that paragraph. Then, you have to create a new topic (a collection file), copy the paragraph into the collection file, and assign it an ID. Then you have to replace the duplicated text (in topics) with a content reference (a/k/a conref). It’s a worthwhile thing to do, because you might say the same thing slightly differently otherwise. Still, who wants to go through an entire book (or worse, set of books), looking for reuse candidates?

Of course, you can always let a computer do the tedious work… if you know how to tell it what to do.

Preparing the (searching) grounds

A while back, I wrote my first useful Python scripts. One takes a particular JSON file and reformats it as a DITA reference topic, containing a table with the relevant data from the JSON file. Another walks through a CSV file, grabbing the columns I need, and producing topics documenting a TR-069 data model. Both scripts take advantage of a vast library of pre-written code to parse their input files.

It occurred to me that, if I were to find (or create) a way to export all the text from a DITA book into a CSV file, I could use a Python script to compare each paragraph to all the others. Using fuzzy matching would help me find “close enough” matches. That was a while ago, because I bogged down on trying to get properly-formatted text out of DITA.

Last week, I got bored. Someone on the DITA-OT forum mentioned a demo plugin that translated DITA to Morse code, and the lightbulb in my head went on. If I could modify that plugin to just give text instead of -.-. .-. .- .—. then maybe I’d have what I needed.

It was an abject failure. What I need is one line per block element (paragraph, list item, etc). What I got was one line for the entire topic, sometimes with missing spaces. I put that aside, but realized that DITA-OT can also spit out Markdown. If I could convert Markdown to plain text, I’d be ready to rock!

So you want to convert DITA to Markdown? It’s easy, at least with the newer toolkits:

dita --format=markdown_github --input=my.bookmap --args.rellinks=none

The DITA-OT output continues to be topic-oriented, writing each topic to its own file. That wasn’t quite what I wanted, or so I thought at the time. Anyway, we have Markdown. How do we get plain text out of it, with each line representing a block element?

Turns out that pandoc, the “Swiss Army knife for converting markup files,” can do it:

pandoc -t plain —wrap=none -o topic.txt topic.md

In the heat of problem-solving, I realized I didn’t need a CSV file… or Python. I could pick up Awk and hammer my nails the text into shape. My script simply inhaled whatever text files I threw at it, and put all the content into an array indexed by [FILENAME,FNR] (FNR is basically the line number of paragraphs inside the file). There was a little stray markup left, not to mention some blank lines, and a couple of Awk rules threw unneeded lines into the mythical bit bucket.

Got a (fuzzy) match?

A typical match is an all or nothing Boolean: you get true (1) if the strings are an exact match, or false (0) if they don’t.

Fuzzy matching uses the universe of floating-point numbers in between 0 and 1 to describe how close a match is. It’s up to you to decide what’s close enough, but you usually want to focus on values of 0.9 and higher. And yes, an exact match still gives you a score of 1.

Why do we want to do this? Unless content developers are really good about cutting and pasting in a pre-reuse environment, inconsistencies creep in. You might see common operations described in slightly different ways:

Click OK to close the dialog.
Click OK to close the window.

So along with flagging potential reuse candidates, a fuzzy match can help you be consistent.

Python and Perl have libraries devoted to fuzzy matching. There are several ways to do a fuzzy match, but one of the more popular is called the Levenshtein distance. There's a scary-looking formula at the link, but it boils down to single-character edits (addition, deletion, or replacement). The distance between “dialog” and “window” is 4 (d→w, a→n, l→d, g→w).

But this is an integer, not a floating-point number between 0 and 1! But that’s easy to fix. If l1 and l2 are the lengths of the two strings, and d is the calculated Levenshtein distance, then the final score is (l1+l2-d)/(l1+l2). In the above example, the score is 0.93—the strings are 93% identical.

There are websites with Levenshtein distance implementations in all sorts of different programming languages, although the ones written in Awk are not as common. But no problem. Awk is close enough to C that it’s simple to translate a short bit of code. I picked the second of these two. There was one already written in Awk, but it took a lot more time to grind through a large set of strings.

Save time, be lazy

The time it takes is important, because it adds up fast. Given n paragraphs, each paragraph has to be compared to all the rest, so you have n2 comparisons. A medium sized book, with 2400 paragraphs, means 5.76 million comparisons. Given that a fuzzy comparison takes a lot longer than a boolean one, you want to eliminate unnecessary comparisons. A few optimizations I came up with:

  • It’s easy to get to (n2-n) by not comparing a string to itself. We also do a boolean compare and skip the fuzzy match if the strings are identical. Every little bit helps. Time to analyze 2400 paragraphs: 2 hr 40 min. My late-2013 iMac averages about 600 fuzzy match comparisons per second.
  • By deleting an entry from the array after comparing it to the others, you eliminate duplicate comparisons (once you’ve compared A to B, doing B to A is a waste of time). That eliminates noise from the report, and cuts the number of comparisons required in half. Time to analyze 2400 paragraphs: 1 hr 20 min. Not bad, for something you can do with one more line of code.
  • Skip strings with big differences in length. Again, if l1 and l2 are the lengths of two strings, then the minimum Levenshtein distance is abs(l1-l2). If the best possible score doesn’t reach the “close enough” threshold, then you don't have to do the fuzzy match. Time to analyze 2400 paragraphs: 5 min 30 sec!!! Now that’s one heck of an optimization!

So we’ve gone to something you run overnight, or at least during a long lunch break, to something that can wrap up during a coffee break (eliminating 96.5% of the time needed is a win no matter how you look at it). Now if your book is all blocks of similar length, it will take longer to grind through them because there isn’t anything obvious to throw out.

Still, this is down to the realm where it's practical to build a “super book” (a book containing a collection of related books) and look for reuse across an entire product line. That might get the processing time back up into the multiple-hours realm, but you also have more reuse potential.

Going commercial

The commercial offerings have some niceties that my humble Awk script does not. For example, they claim to be able to build a collection file (a “library” of sorts, containing all the reusable paragraphs) and apply it to your documentation. That by itself might be worth the price of entry, if you end up with a lot of reuse.

They also offer a pretty Web-based interface, instead of dropping to the command line. And, they have likely implemented a computing cluster to grind through huge jobs even faster.

But hey, if you’re on a tight budget, the price is right. I’m going to make sure the employer doesn’t have a problem with me putting it up on Github before I do it. But maybe I’ve given you enough hints to get going on your own.
UPDATE 10 May 2020: The script is now available on Github.

Tuesday, July 02, 2019 5 comments

Tech Tuesday: Y'all Watch This

While we were cleaning up the house, a Casio wristwatch turned up. Sizzle didn’t claim it, and I had never seen it before. Mason thought he might like to have it, once it had a battery in it. It has an analog face with a small LCD display, not terribly geeky-looking. I’m used to Casio watches having a tiny keypad and a zillion functions.

So I got out the little tools and the magnifying light, and popped the back off. There was the battery, but the number (16xx) was a new one, and there was a small metal clip holding it down. Not seeing an obvious way to get it off, I did what anyone does these days: Googled for instructions. The watch is a Wave Ceptor, and the first thing that popped up is “this watch has a solar panel and a rechargeable battery.” It was a bright sunshiny day, so I put it back together and stuck it in the window. Sure enough, after a few hours, the little display coughed to life, showing what I thought at first was t  1.

We get signal.
Having no idea what to do next, I headed back to Google to find a manual. That’s when I found out you need only set the timezone and the watch does the rest, using a long wave receiver at night to download the time from WWVB (if using a North American timezone). The link has all the gory details, but WWVB transmits time data at a blazing 1 bit per second (actually, it's a tri-state, with values of 0, 1, or marker—does that make it a trit?). The watch tries receiving at the top of each hour from midnight to 4 a.m.

Just for grins, I watched it the other night. Sure enough, it showed its “receiving” display at 1 a.m., made a small adjustment, and moved on.

The other thing I found out was that the first display I saw wasn’t a lowercase T, it was 土 (an abbreviation for Saturday in Japan). The watch’s epoch (first time) is January 1, 2000. It also has a “Y2.1K” problem, in that its year doesn’t go past 2099. If it’s still around then, I guess one of Mason’s kids will have it.

So once again, Casio made a geeky watch—but this time, they hid the geekiness on and under the face. Oh, and it does have a stopwatch, alarms, and a “world time” mode (uses the little LCD to show the time in a second timezone). It has a light, but the hands and numbers are phosphorescent, so you can at least see what time it is in the dark without using the battery. Putting it in a sunny window for half an hour is more than enough to keep it running another day.

So we tried to put it on Mason, and his wrist is too skinny for a large-face watch like this one. I’d been wearing an iFitness watch for a while, but it often misses steps and has lately developed a habit of trying to pop out of the band (I lost it for over a week that way). It has a decent sleep monitor, and my phone does a better job of counting steps, so now I wear it at night and the Wave Ceptor during the day.

Thursday, May 16, 2019 No comments

Adventures of a #techcomm Geek: Sharp Edges when Rounding

One of the advantages of using a text-based markup grammer for documentation—these days, often XHTML or some other XML, but could be Markdown, Restructured Text, Asciidoc, or even old-sK001 typesetting languages like troff or TeX—is that they’re easy to manipulate with scripts.

There are quite a few general-purpose scripting languages that do a fine job of hunting down and acting on patterns. I’m conversant with Perl, and am learning Python; but when I need to bang something out in a hurry and XML is (mostly) not involved, Awk is how I hammer my nails. Some wags joke Awk is short for “awkward,” and it can be for those who are used to procedural programming. Anyone exposed to event-based programming—where the program or script reacts to incoming events—will find it much more familiar. Actually, “awk” is the initials of the three people who invented it: Aho, Weinberger, Kernighan (yes, that Brian Kernighan, he who also co-invented the C language and was a major player on the team that invented Unix).

Instead of events, Awk reacts to patterns. A pattern can be a plain string, a variable value, a regular expression, or combinations. Other cool things about Awk:
  • Variables have whichever type is most appropriate to the current operation. For example, your script might read the string “12.345,” assign it to x, then you can use a statement like print x + 4 and you’ll get 16.345.
  • The language reference (at least for the original Awk) fits comfortably in a manpage, running just over 3 pages when printed. Even the 2nd edition “official” reference is only 7 pages long.
  • It’s a required feature in most modern Unix specifications. That means you’ll always have some version of Awk on an operating system that has some pretensions to be “Unix-like” unless it’s a stripped, embedded system. On the other hand, even BusyBox-based systems include a version of Awk. Basically, that means Awk is everywhere except maybe your phone. Maybe.
If your operating system is that Microsoft thing, you can download a version of Awk for it. If you install the ISH app, you can even have it on an iPhone.

Now what am I going to do with it?

Okay. I told you all that to tell you this.

I’m working on something that extracts text from a PDF file, and formats it according to rules that use information such as margin, indent, and font. It requires an intermediate step that transforms the PDF into a simple (but very large) XML file, marking pages, blocks, lines, and individual characters.

“But wait a minute!” you say. “I thought Awk only worked on text files. How does it parse XML?”

Like many useful utilities first released in the 1970s, Awk has been enhanced, rewritten, re-implemented from scratch, extended, and yet it still resembles its ancestral beginnings. The GNU version of Awk (commonly referred to as gawk) has an extensions library and extensions for the most commonly-processed textual formats, including CSV (still beta) and XML. In fact, the XML extension is important enough that gawk has a special incantation called xmlgawk that automatically loads the XML extension.

The neat thing about xmlgawk, at least the default way of using it, is that it has a very Awk-like way of parsing XML files—it provides patterns for matching beginnings of elements, character data, and ends of elements (and a lot more). This is basically a SAX parser. If you don’t need to keep the entire XML file in memory, it’s a very efficient way to work with XML files.

So. In most cases, I only need the left margin of a block (paragraph). Sometimes, I need the lowest extent of that block as well, to throw out headers and footers. I need to check the difference between the first and second line (horizontally), and possibly act upon it.

In the document I used for testing, list items (like bullets) have a first line indent of –18 points. “Cool,” I said. “I can use that to flag list items.”

All well and good, except that it only worked about 10% of the time. I started inserting debugging strings, trying to figure out what was going on, and bloating the output beyond usefulness. Finally, I decided to print the actual difference between the first and second lines in a paragraph, which should have been zero. What I found told me what the problem was.

    diff=1.24003e-18

In other words, the difference (between integer and floating point numbers) was so miniscule as to matter only to a computer. Thus, instead of doing a direct comparison, I took the difference and compared that to a number large enough to notice but small enough to ignore—1/10000 point.

And hey presto! The script behaved the way it should!

It’s a good thing I’ve been doing this at home—that means I can soon share it with you. Ironically, it turns out that we might need it at the workplace, which gives me a guilt-free opportunity to beta-test it.

Thursday, February 28, 2019 No comments

Adventures of a #techcomm Geek: Info Architecture

In this adventure, we find that structure isn’t always structure. And sometimes, the structure jumps up and smacks you to get your attention. More geekiness follows…


Image: openclipart.org
As part of our conversion to DITA at work, I shuffled some things around in the huge manual I work on. I moved a huge wad of reference material into an appendix; other content can easily link to it when needed. But the reshuffling got me to take a look at the reference material.

Managed network devices, like the ones I usually write about for work, usually have a way to message the mothership about various issues. Examples include:


  • Hi, I’m online.
  • The power’s out here. I’m running on my battery.
  • Here’s some stats from the last connection.
  • One of my components just failed!


The messages aren’t that chatty, of course, and they often include some variable data. Some are more urgent than others, and might require some action by the network operators.

I had separate topics describing each message, and they came out of the conversion tool as concept topics—a lot more generic than I wanted. As I was trying to get everything done at once, I didn’t give it too much thought. Since the messages were reference material, they would be fine as references. I split them into sections (format, severity, cause, action), and moved on.

DITA to the rescue? Um… nope.


Later on, I came back to the messages. “There has to be a better way,” I thought. After all, the sections could get out of order, or end up with different titles—there’s all sorts of ways to be inconsistent with reference topics. My next thought was, “Hey, DITA has hundreds of elements, and its prime purpose is software documentation. There's probably an entire message domain waiting for me.”

In reality, there are three message-related elements in the entire ocean of DITA, and two of them are inline (<msgph> and <msgnum>). The third is <msgblock>, for tagging message output.

Ah, the joys of information architecture. Creating a message domain from scratch was a possibility, but would likely be a hard sell to the co-workers.


We’re in trouble(shooting) now


I gave a moment to the idea of using troubleshooting topics—then it hit me. A message has a condition (the message itself), a cause (why it was logged), and a solution (what to do about it). That’s exactly the structure of a troubleshooting topic!

The only sticky point was where to document the message format, and I quickly decided that was part of the condition. I used @outputclass="message" to tag the topics, and to have the transform use Format: instead of Condition: for the condition part. I converted a few to troubleshooting topics, and it worked as well as it seemed it would.

On to the next thing


Then yesterday, I got a meeting invite with an attachment, a follow-up to a discussion a few of us had last week. One of the groups in our far-flung department uses InDesign to produce actual printed deliverables (how quaint!). The fun part is, the page size is about 4 inches square—so it’s not a matter of tweaking our transform plugin; we need a whole new one.

But when I started looking at it, the structure almost leaped off the screen, despite a couple of misplaced pages. Each chapter contained a single task, and each step used one page for substeps and graphics. Having that revelation made the call go a lot faster and more smoothly, because it was one of those things that are obvious once you see it. I just happened to be the first one to see it.

So I did a conversion dance, involving lots of pixie dust: PDF → Word, then Pandoc converted that to Markdown. After some serious cleanup (and moving misplaced content where it belonged), I used a couple of scripts to break the Markdown file into topics and create a bookmap. DITA-OT gobbled up the bookmap and Markdown topics, and spit out DITA topics. Thus, I had a pilot book we can use as test data for the transform.

The InDesign users also have a couple more formats; one is close enough to a regular book that we’ll have them use the standard transform. The other is a folded four-panel sheet… that one is going to be interesting. I’m going to have to resist the temptation of blowing off documentation work for glorious coding.

Stay writing… until I geek again.

Wednesday, December 12, 2018 No comments

Adventures of a #techcomm Geek: Blurrier Image

In today’s installment of Life of a #techcomm Geek, we return to a subject that draws this geek like a moth to flame: file conversions. Hazardous, yet compelling. Lots of geeky stuff follows…


I’ve had this particular line in my Tines to-do list for a while. As part of our transition to a new documentation system, I and another writer handled the conversions. We had a high-end tool to help us out, although creating rules was a dicey proposition and the vendor ended up helping (we made tweaks where they could make an obvious difference, though).

In the most recent round, we got to the FrameMaker-based docs. Frame (as its users often nickname it) is unique in that it allows overlaying callouts and other graphic elements on top of images. This is a huge help for translating manuals, because the writers don’t have to maintain a separate set of graphics for each language. Anyway, since the new system isn’t FrameMaker, something else had to happen. The conversion system could be configured to either flatten the images (convert to a PNG, rasterizing the callouts) or create an SVG (Structured Vector Graphics). We chose the latter, thinking that since SVG is an XML format, the new system could maintain them easily.

We were wrong.

Long story shortened considerably, we eventually threw up our hands and decided to convert all the SVGs to “flattened” PNG files. The writers would keep the SVG files on their hard drives to make changes, then upload a new flattened PNG when needed. I wrote a script to do the deed; it crunched through hundreds of SVGs at about one per second, and updated all the links in the book to point to the new PNGs.

All well and good, until one of the writers went to publish. “The images look blurry,” she told me. Taking a look, she was obviously right. It took me about three seconds to figure out why.

You see, our SVG files have a width attribute, which was set to the width in the original FrameMaker files (a typical width is 576 pixels, which at 96dpi is 6 inches even). All well and good, but the original images run about 1200 pixels wide—so in essence, we were throwing away over ¾ of the image data when doing the conversion. No wonder it looked blurry! But we were all weary of messing with it by that point; I had written scripts that:

  • extracted embedded images from an SVG, converted them to PNG, then changed the link so the SVG referred to the file instead
  • went the other way, embedding images in an SVG
  • converted the entire mess to PNG in one swell fwoop

The documentation work that was my primary job function had been back-burner’ed for too long. I added an “investigate this further” item to my backlog list and got back to the bread-and-butter part of my job.

This week, I all but cleared a fairly long to-do list in three days, so I thought maybe I could give this thing another shot. A quick Google turned up some promising code on superuser.com; I divided the image width by the scaled-down width in one SVG, applied the script, and got a nice sharp image! The only problem with that is, it would take about 10 minutes to do each file by hand, and there are hundreds. A script is the only practical way to blast through all of them.

When I tackle a situation like this, I tend to use a shell script to drive awk, Perl, and XSLT scripts. Each has its strengths, and trying to force (say) XSLT to work some of awk or Perl’s string-processing magic is more trouble than it’s worth. And vice versa. So… XSLT to extract the file name and (scaled) width, awk to parse the output of file (a utility that returns the dimensions of an image file) and do the calculations, all wrapped up in a shell script to conduct the Geek Orchestra.

Of course, I ran out of time this afternoon to put the whole thing together, but I have all the sub-script logic down. I just need to score the symphony. That will likely take me to noon tomorrow, then I’ll be back to bugging people already bogged down with too much stuff to lend me their expertise.

I also achieved Inbox Zero at work today… and that’s a rant for another time.

Tuesday, October 23, 2018 No comments

(Partially) Disconnected

A couple weeks ago, I noticed my phone was starting to discharge a lot faster than normal. Thinking I had issues with an app not being cooperative, I checked the app consumption levels in Settings and made a couple of adjustments. I usually could get a day and a half out of my phone with normal use, plugging it into the car charger on the way in or out of the office if needed. But it got to where normal use gave me about five hours of battery life.

I finally set up a call with Apple support, and the tech set me up with a repair ticket. In case you weren't aware, Apple is replacing batteries (if needed) on certain iPhone models (including iPhone 6, my particular phone) for $29… and $5 shipping if you do it by mail instead of bringing it in. Seeing that a DIY battery replacement was $25 about three years ago, I figured this was a no-brainer.

Fits in the palm of your hand… with room to spare.
I was now temporarily phone-less. Or was I? When I got the 6, I retired my iPhone 4, the one I'd replaced the battery in, repurposing it essentially as an expensive iPod touch. The SIM won't fit in it, which I expected, so it’s Wi-Fi only. I can still message the wife and DD, and even make and take calls using FaceTime. I was hoping to use it for Skype, but the app only gives you the option to upgrade and the current version won't work on the older phone. (thankyouverymuch, Microsoft)

Whatever. Where there's Wi-Fi, there's the ability to contact family, anyway. I spent the weekend weeding off old pictures and older messages (this phone was in service from 2012 to 2016), and got a GB or so freed up to download podcasts for the commute. I'll probably start deleting apps soon, starting with Twitter—it crashes too often, and of course I can't upgrade it. Next up will be stuff I never use or works equally poorly. That should get me through the week, then I should get my primary phone back.

Quite the size difference
One nice thing is getting re-acquainted with several games that are no longer supported on the newer systems—Bejeweled 2 and Sudoku Mania, to name two. Some newer games still work on it as well… Smash Hit is a surprising example that does occasionally hit a frame-rate stutter. But it feels so tiny, reminding me of my skepticism about a phone that was too long to nestle down into a shirt pocket. I guess I adjusted quickly.

The other thing I like is the speaker dock. It, like the 4, has the older dock connector. So I can't put the new phone on it. I guess when I get my primary phone back, I'll erase the old one and find someone who needs it more than me. They can also take the speaker dock, since it doubles as a charging station.

But it'll be nice to hang out with the '4 for the week ahead. Our last hurrah, so to speak. Charlie glommed it this evening, and was adamant about not giving it back, so I pulled up the Bubbles app for him (one of those that doesn't work on newer phones) while he clung to it and fussed. (I guess he figured a little bitty phone is meant for a little bitty user.) He played with it for half an hour, maybe more. I remember letting Mason play with the phone when he was like 3 or 4, and he discovered the Camera app. I locked my '6 when I got it, but he had access to an iPad mini (and an original iPad) by then. He still tries to wheedle my passcode out of me, though.

I had a bit of heartburn this evening, when I received a shipping box from Apple (2 days after I sent the phone off!). I got on the chatline with Apple Support, and they verified the box had been shipped by mistake and everything was in the queue. Oh well, now I have a SIM remover tool… no more paper clips!

'Course, this means I'll be hard to reach this week. Email me… or leave a comment here!

Monday, October 16, 2017 No comments

Tines 1.11.1

… is out. This is a quick bug-fix release, the PageUp and PageDown keys should now work properly (on Macs, use Fn-up-arrow and Fn-down-arrow). I’ve also merged the dev branch into the master branch on GitHub.

To make things more convenient, I registered tines-outliner.org and set it up as a shortcut to the repository.

If you aren’t familiar with Tines, it’s a console-mode outliner. It runs in MacOS X, Linux, and the Microsoft thing (using Cygwin). It’s unique in that it supports a “text” tag for entries, and so can differentiate between headings and body text. It can export to Markdown, HTML, *roff, and plain text formats. You can export your outline to Markdown and pull it into Scrivener. See Getting Your Outline into Scrivener (pt 2) for details.

Tines, and a few other outliners, has support for to-do lists (basically a collection of entries with checkboxes). That means you can use it to keep outlines, goals, snippets of scenes, and notes about your stories in a single place.

Compile?

Yes, the word “compile” is composed of the Latin com (together) and the English pile (a random heap, or hemorrhoid). So, yes,  compile means either to throw things together, or a multifaceted pain in the @$$. Still, if you need an outliner, it’s available!

I want to get back to working on an install package, at least for MacOS X. I’ll probably have to leave packages for the other operating systems to their own experts (not that I’m anything like an expert in MacOS X packaging, mind you).

Tuesday, March 21, 2017 No comments

Tech Tuesday: Roll Your Own Writing System, part 6: Jekyll


The series rolls to an end…

In Part 1, we had a look at Markdown and the five or six formatting symbols that cover 97% of written fiction. Part 2 showed how you can use Markdown without leaving the comfort of Scrivener. Part 3 began exploring eBook publishing using files generated from both Scrivener and directly from MultiMarkdown. Part 4 provided a brief overview to a different tool called Pandoc that can convert your output to a wider variety of formats, and is one way to create print documents for beta readers or even production. Part 5 described how to use MultiMarkdown’s transclusion feature to include boilerplate information in an output-agnostic way, and how to use metadata variables to automatically set up front matter.

Scrivener is an excellent writing tool, and we have seen how using it with MultiMarkdown only makes it better. But there are conditions where abandoning the GUI for a completely text-based writing system just makes sense. For example, you might want to go to a minimalist, distraction-free environment. You may want to move to a completely open-source environment. Or you might need to collaborate with someone else on a project, and Scrivener really isn’t made for that.

Don’t Hyde from Jekyll


Jekyll is the most popular static site generator. You write in Markdown—Jekyll’s particular flavor, which is similar to MultiMarkdown in many ways—and if Jekyll is running, it automatically converts your pages to HTML as soon as you save. It even includes a built-in web server so you can see what the changes look like.

If you’re on a Mac, installation is almost too easy. Drop to a command line, enter gem install jekyll bundler, and watch a lot of weird stuff scroll by. It’s as easy on Linux, if you have Ruby 2.0 or newer installed. On the Microsoft thing, there are some specific instructions to follow (I installed it on my work PC, no problem).

Once it’s installed, get going by following the quick-start instructions.

Organizing


Unlike Scrivener, organizing your project is on you. But there are a couple things that might help:

Each story or project should live in its own folder. Within that folder, tag each chapter or scene with a number to put everything in its proper sequence. For example:

100_chapter_1.md
110_arrival.md
120_something_happens.md
200_chapter_2.md
210_more_stuff_happens.md

It’s a good idea to increment by 10 as you create new scenes, in case you need to insert a scene between two existing ones later. To move a scene, change its number. If you have more than nine chapters, use four-digit numbers for the sequence. (If you need five-digit numbers, you should seriously consider turning that epic into a series of novels.)

Differences from MultiMarkdown


Like MultiMarkdown, Jekyll’s flavor of Markdown supports variables and transclusion. But there are a couple differences. In Jekyll, variables look like MultiMarkdown’s transclusion:

{{ page.title }}

You can draw variables from the page’s metadata, or from the _config.yml configuration file (in which case you replace page with site).

Transclusion is a function of the Liquid templating language, built into Jekyll. To include a file:

{% include.relative file.md %}

You can also use include instead of include.relative to pull files from the _includes directory. By using Liquid, you can specify parameters to do different things, effectively creating your own extensions.

For example, here’s how you might do section breaks:

<p class="sectionbrk">
  {% if include.space %}&nbsp;{% else %}&bull; &bull; &bull;{% endif %}
</p>

So if you just enter {% include secbrk.html %}, you get three bullets. To get a blank line, enter {% include secbrk.html space="true" %} instead.

Also like MultiMarkdown, Jekyll supports a metadata block at the beginning of a file. While they look very similar, Jekyll uses YAML format for its metadata. The upshot is, a Jekyll file begins and ends its metadata with a line of three or more dashes, like this:

---
title: The Sordid Tale of Woe
author: Henrietta Jekyll
permalink: /sordid/sordid_tale.html
---

Certain metadata tags are special to Jekyll. For example, permalink specifies the name and location of the HTML file Jekyll creates from the Markdown source. Another important tag, layout, can be used to choose a template. You can set the default layout in the configuration file, then use a second configuration file to override it for doing things like publishing.

Git Out


Jekyll is also a blogging tool. Your posts go into a special directory, _posts, and have a specific naming convention. Two additional metadata tags are important:

date:   2017-03-21 07:00:00 -0500
categories: writing technology

The date entry specifies the date and time your post goes live on the generated site. The categories entry lets you tag each post for easier searches.

But all that’s just pixels on the screen unless you have a place to put your site. That’s where Github Pages comes in. You can upload your Jekyll files to Github Pages, and it automatically updates your site when it finds new or changed content. This is pretty useful, but it’s even more useful when you’re working with other people. Everyone has their own copy of the source files on their own computers, and they can each push (update) their changes as needed.

Now What?


I hope I’ve given you some ideas for new ways of looking at your writing, and how to make the publishing part more efficient and more collaborative.

The rest… is up to you. I’d love to see your own ideas in the comments.

Tuesday, March 14, 2017 No comments

Tech Tuesday: Roll Your Own Writing System, part 5: Reuse

The series rolls on…

In Part 1, we had a look at Markdown and the five or six formatting symbols that cover 97% of written fiction. Part 2, showed how you can use Markdown without leaving the comfort of Scrivener. Part 3 began exploring eBook publishing using files generated from both Scrivener and directly from MultiMarkdown. Part 4 provided a brief overview to a different tool called Pandoc that can convert your output to a wider variety of formats, and is one way to create print documents for beta readers or even production.

Way back in Part 2, we used Scrivener to embed HTML separators between scenes and for internal scene breaks. As we saw last week, that doesn’t work when you need to output to a different format. As it turns out, there’s a way to work around that by using MultiMarkdown’s transclusion mechanism. Transclusion and metadata variables provide the capability for reuse, pulling common boilerplate files from a library.

Inclusion… Transclusion?


Transclusion is a technical term, but it’s easy enough to explain. You use it to embed another Markdown file into your document, like you might include a graphics file. A function like this is essential when you’re maintaining a collection of technical documents, because you can reuse common sections or passages—write them once, store them in a library of common files, and then changing one of the source documents automatically updates all the documents that use it. For fiction writing, it’s a good way to pull in all those boilerplate files (about the author, front matter, etc.) that you need for each book.

To transclude a boilerplate file, put this on its own line:

{{myfile.md}}

When you run multimarkdown, it pulls in the contents of myfile.md and processes it.

Now here’s where it gets fun. Say you really need to be able to output to both HTML and OpenOffice. Instead of embedding HTML that gets ignored in the OpenOffice conversion, or vice versa, you can use a wildcard:

{{myfile.*}}

Now, when you output to HTML, multimarkdown transcludes the file myfile.html. When you want OpenOffice, it uses myfile.fodt. You just have to supply the files with the right extensions and content, and you’re off to the races! You can use this in the Separators in Scrivener to choose the right markup for your output.

A few caveats for fodt transclusion: You cannot use entities like &bull; or &#8026; to specify special characters. You have to enter them as characters. If you only have one line to add, you don’t need to put any OpenOffice markup in the fodt file—plain text is fine, but use the right extension so multimarkdown knows which file to use.

If you want to reuse transcluded files with other documents, you can add another line to the metadata:

Transclude Base: /path/to/your/files

You can use a relative path like ../boilerplate, but it’s safer to specify the entire path in case you move the file to some other location.

Does the Front Matter?


But transcluding boilerplate files is only the beginning. Especially for front matter, you need to change at least the title for each book. Fortunately, MultiMarkdown has that covered.

In Scrivener’s Compile window, the last entry is Meta-Data. Back in Part 3, you used this to specify a CSS file for HTML output. Scrivener pre-fills entries for the Title and Author, but you can add anything else you want here. All the metadata ends up at the beginning of the file, where MultiMarkdown can process it further.

So you might have a block that looks like this:

Title: Beyond All Recognition
Subtitle: The Foobar Chronicles, Book 1
Author: Marcus Downs
Copyright: 2017
Publisher: High Press UR

Create a title page that looks like this (for HTML output):

<div style="text-align:center" markdown="1">
**[%title]**

**[%subtitle]**

by  
[%author]

Copyright [%copyright] [%author]. All rights reserved.

Published by [%publisher]
</div>

![](logo.png)

{{TOC}}

Instant front matter! The {{TOC}} construct inserts a table of contents, another Multimarkdown feature.

Now What?


Now you know how to include boilerplate files in your book, and how to automatically put the right text in each output format.

Next week… it’s something completely different to wrap up the series.

Tuesday, March 07, 2017 No comments

Tech Tuesday: Roll Your Own Writing System, part 4: MultiMarkdown and Pandoc

The series rolls on…

In Part 1, we had a look at Markdown and the five or six formatting symbols that cover 97% of written fiction. Part 2, showed how you can use Markdown without leaving the comfort of Scrivener. Part 3 began exploring eBook publishing using files generated from both Scrivener and directly from MultiMarkdown.

Today, we’re going to take a brief look at a different tool you can use to publish MultiMarkdown files.

Pandoc describes itself as a Swiss Army knife for markup languages, but it goes farther than that. More than markup languages, it converts to and from common word processor formats and can even convert directly to EPUB. You can mess with templates to get the output really close to production-ready, but that's a little beyond the scope of our series here. In real terms, it’s not any faster than loading a prepared HTML file into a skeleton EPUB; both methods need a little cleanup afterwards.

This sounds at first like it’s just an alternative to using MultiMarkdown, but it goes a little farther than that. One problem with embedding HTML in your Markdown files, none of it gets converted to other formats. So you can’t just take your MultiMarkdown file and create an OpenOffice file by running:

multimarkdown --to=odf story.md >story.fodt

Because all your section breaks disappear. Pandoc ignores embedded HTML as well… so again, what does Pandoc buy you?

Well, once you have your HTML file, you can use Pandoc to convert that HTML file to the word processor format of your choice.

pandoc -f html -t odt -o story.odt story.html

And there’s the answer to how you make your story available for beta readers who want a word processor file. If you’re willing to tolerate some sloppy typesetting, you could use it for your print document as well. Pandoc also supports docx and rtf as output formats.

Now What?


Now you can output your MultiMarkdown file in a number of formats, including eBook (direct and indirect) and common word processor formats.

Next week, we’ll look at some special features of MultiMarkdown that you might find useful.

Comments? Questions? Floor’s open!

Wednesday, March 01, 2017 No comments

Tech Tuesday: Roll Your Own Writing System, part 3: Publishing MultiMarkdown

The series rolls on…

In Part 1, we had a look at Markdown and the five or six formatting symbols that cover 97% of written fiction. Last week, we saw how you can use Markdown without leaving the comfort of Scrivener.

This week, it's time to build an eBook using MultiMarkdown output. If you have been cleaning up Scrivener’s EPUB output in Sigil, you should find the process familiar—only, without most of the cleanup part.

First thing, output an HTML file through MultiMarkdown. In Scrivener, click the Compile button and select MultiMarkdown→Web Page in the dropdown at the bottom of the screen.

Under the Overhead

Open Sigil, then import your HTML into a new eBook—or better yet, a “skeleton” eBook with all the boilerplate files already in place.

All you have to do now is to break the file into separate chapters and generate a table of contents. You can save even more time by creating a custom text and folder separator in the last part of Scrivener’s Compile Separators pane:

<hr class="sigil_split_marker"/>

Then, when you’ve imported your HTML file, just press F6 and Sigil breaks up the file for you. If you start with a skeleton EPUB file, you can have a perfectly-formatted EPUB in a matter of minutes. Seeing as it takes me an entire evening to clean spurious classes out of Scrivener’s direct EPUB output, this is a gigantic step forward.

One thing to watch out for: MultiMarkdown inserts a tag, <meta charset="utf-8"/>, at the beginning of the HTML output. EPUB validators choke on this, insisting on an older version of this definition, but all you have to do is remove the line before you split the file.

Breaking Free

Perhaps you want to slip the surly bonds of Scrivener. Maybe your computer died, and your temporary replacement does not have Scrivener—but you saved a Markdown version of the latest in your Dropbox, and your beta readers are waiting.

Scrivener has its own copy bundled inside the app, so you’ll need to download MultiMarkdown yourself. It runs from the command line, which is not as scary as it sounds. In fact, Markdown and MultiMarkdown are very well-suited to a distraction-free writing environment.

After you’ve installed MultiMarkdown, start a Terminal (or Command Line on that Microsoft thing). On OSX, press Cmd-Space to bring up Spotlight. Type term, and that should be enough for Spotlight to complete Terminal. If you prefer, you can start it directly from /Applications/Utilities.

Next, move to the right directory. For example, if your file is in Dropbox/fiction, type cd Dropbox/fiction (remember to reverse the slash on the Microsoft thing).

Here we go…

multimarkdown mybook.md >mybook.html

Now you have an HTML file that you can import into Sigil (just don’t forget to remove that pesky meta tag).

Silly CSS Tricks

Last week, I mentioned a couple of things you can do with CSS to help things along.

First, when you Compile your Scrivener project to MultiMarkdown, click Meta-Data in the options list. You should see some pre-filled options: Title, Author, and Base Header Level. Click the + above the Compile button to add a new entry. Call the entry CSS then click in the text box below and enter ../Styles/styles.css—if you’re using Sigil, it puts all stylesheets in the Styles directory. You can give it another name if you have a stylesheet pre-defined (mine is called novel.css).

Pre-define your CSS

Now open your stylesheet, or create one if you need to. Add the following entries:

p.sectionbrk {
    text-indent:0; text-align:center;
    margin-top:0.2em; margin-bottom:0.2em
}
.sectionbrk + p { text-indent: 0; }
h1 + p { text-indent: 0; }

The first entry formats the sectionbrk class to be centered, with some extra space above and below. The second one is more interesting: it cancels the text indent for the paragraph after a section break. The third entry does the same thing for a paragraph following a chapter heading (you can do this for h2 if needed as well). This is the proper typographical way to format paragraphs following headings or breaks, and you don’t have to go look for each one and do it yourself. I told you this can save a ton of time!

Again… Now What?

Now you can work with MultiMarkdown within Scrivener. You can export it, generate an eBook, and work with the file outside of Scrivener.

Next week, I’ll show you another way to make an eBook from your MultiMarkdown file.

Comments? Questions? Floor’s open!

Tuesday, February 21, 2017 1 comment

Tech Tuesday: Roll Your Own Writing System, Part 2: Markdown in Scrivener

Last week, I showed you a brief introduction to Markdown. I only hinted at why you might want to use Markdown instead of comfortable old bold/italic (and other decorations). I’ll get detailed next week, but here’s a hint: you can save yourself an entire evening of work getting your eBook prepared for publication.

This week, though, we’re going to look at how Scrivener and Markdown work together. TL;DR: Very well, actually.

Scrivener supports a Markdown extension called MultiMarkdown. You don’t have to worry about the extensions, unless you’re writing more technical fiction with tables and the like. For fiction, what I showed you last week should cover all but decorative stuff.

Make a copy of your WIP. Got it open? Original one is closed? Okay, let’s get started.

In Scrivener, click the Scrivenings icon in the toolbar, then click the Draft or Manuscript icon in your Binder (whichever one your story is in). You should now see your entire story laid out in Scrivener.

Click anywhere in the story text, then go to Format menu→Convert→Bold and Italics to MultiMarkdown Syntax. If you use anything other than bold/italic in your writing—like typewriter font for text messages, or blockquotes for letters—you’ll have to go through your manuscript and mark those yourself. This is that other 10% I mentioned last week.

Stylin’

Scrivener has formatting presets, since it only remembers the formatting and not the preset name after you apply it. Not as good as styles, but they work for our purposes.

Markdown uses backticks (a/k/a accent grave) to define typewriter font: `this is a text message`. You can either insert your backticks by hand, or let Scrivener insert them when you publish. I have a preset called Typewriter for this, but we can define a new preset or redefine an existing one. Here’s how it works: any string of text marked “Preserve Formatting” (Format menu→Formatting→Preserve Formatting) gets the backtick treatment at Compile time.

So go find a text message or other small string of typewriter text in your manuscript, and select it. Apply Preserve Formatting as described above, and the text gets highlighted in cyan or light blue.

Now, go Format menu→Formatting and:

  • for a new preset: New Preset from Selection
  • to redefine a preset: Redefine Preset from Selection→(preset name)

For a new preset, enter the name in the dialog box. In both cases, select Save Character Attributes in the dropdown to create a text (as opposed to a paragraph) preset. Now, any time you mark a selection of text as Typewriter (or TextMsg, or whatever you called it), you’ll see it highlighted and in your designated typewriter font.

Looks good, gets converted to backticks. What’s not to like?

To make a block quote, put a > at the beginning of each paragraph in the block, and in any blank lines in between. Add a blank line to the end of the blockquote so the next paragraph doesn’t get picked up as well. Scrivener assumes that preserved-format paragraphs are code blocks, and displayed as-is, so you can’t use its Block Quote preset this way unless you turn off Preserve Formatting. In either case, you’ll have to add the > character.

Okay, ship it!

Not quite. There are still a few things you need to set up before you can get to the Efficiency Nirvana that Scrivener and MultiMarkdown offer.

To see where we need to go, let’s have a look at the output. In Scrivener, click Compile, then go to the Compile For: dropdown at the bottom of the compile window and select MultiMarkdown. You could also try MultiMarkdown→Web Page. Don’t forget to check which directory it’s going in, so you’ll be able to find it. Open it an a text editor (Text Edit, Notepad, whatever you like).

You should now see a few lines at the top with the story and author name, followed by the rest of the story. If you don’t use blank lines between paragraphs, your paragraphs run together in one big blob. There may not be any chapter titles, and likely no section breaks beyond blank lines. So let’s start fixing things. You’ll only have to do these once, or (at worst) once for each project.

Close the file, go back to Compile, and click Separators in the list. For Text Separator, click Custom and then enter the following:

<p class="sectionbrk">&bull; &bull; &bull;</p>

This tells Scrivener to put three bullets between each scene. (Anything Markdown or MultiMarkdown can’t do directly, you can do with HTML.) You’ll want to create or edit a CSS file to format the sectionbrk class the way you want (most people want it centered with a little space above and below). We’ll go over how to automatically link the CSS file to your HTML in a later post.

Set the other parts to Single Return. That’s all you have to do for Separators. In the other options:

  • Formatting: Check Title for Level 1 (and lower levels, if needed) folders.
  • Transformations: Check:
    • Straighten Quotes
    • Convert em-dashes
    • Convert ellipses
    • Convert multiple spaces
    • Convert to plain text: Paragraph Spacing
  • Replacements:
    • Replace (Option-Return twice); With (Option-Return)<p class="sectionbrk">&nbsp;</p>(Option-Return twice)

The Transformations section sounds a little scary, but MultiMarkdown re-converts those text entries to their nice typographical equivalents. I suggest you do it this way for more consistent results. The Replacements entry just inserts a blank section break that won’t get deleted during a conversion. You could just insert a non-breaking space, but (again) a later blog post will show you how you can use this to eliminate formatting issues.

Converting paragraph spacing to plain text replaces a paragraph break with two paragraphs, inserting a blank line between paragraphs as Markdown expects. It works if your Body paragraph format puts space at the beginning or end of the paragraph. If you use indents instead, try “Paragraph Spacing and Indents” and hope the indents are deep enough for Scrivener to catch.

If that doesn’t work, add two more entries to Replacements:

  • Replace (Option-Return); With (Option-Return twice)
  • Replace (Option-Return four times); With (Option-Return twice)

The two replacements are needed because of a bug in Scrivener. It converts one return to four instead of two, but the second time through fixes it.

Now hit Compile, then open the generated file in a text editor. You should see a plain text file, with a blank line between each paragraph and Markdown syntax for various highlighting. You can go back into Scrivener and try MultiMarkdown→Web Page to see what that looks like, too.

Now What?

Now that you can export a clean MultiMarkdown file from Scrivener, you can work with it in any text editor. Sometimes, just looking at the same text in a different way is enough to get you moving on a WIP and get it done. If you have an iPad, you can still edit your Markdown-ified project using Scrivener on iOS, or you can use an iOS Markdown editor like Byword to edit your Markdown file (and import it back into your Scrivener project later).

But that’s only scratching the surface. Next week, we’ll start looking at ways to prep your MultiMarkdown file for beta or final publishing.

Comments? Questions? Floor’s open!

LinkWithin

Related Posts Plugin for WordPress, Blogger...