In typography, a pair of quotation marks were traditionally oriented toward one another. They look like this:
“smart quotes”
As computers became popular in the mid-twentieth century, the orientation was often abandoned. The original character set of computers didn't have much room to spare, so it makes sense that two double-quotes and two single-quotes were reduced down to just one of each in the ASCII specification. These days the common character set is Unicode, with plenty of space for lots of fancy quotation marks and apostrophes, but many people have become used to the minimalism of just one character for both opening and closing quotes. Besides that, computers actually see the different kinds of quotation marks and apostrophes as distinct characters. In other words, to a copmuter the right double quote is different from the left double quote or a straight quote.
Replacing smart quotes with sed
Computers aren't typewriters. When you press a key on your keyboard, you're not pressing a lever with an inkstamp attached to it. You're just pressing a button that sends a signal to your computer, which the computer interprets as a request to display a specific predefined character. The request depends on your keyboard map. As a Dvorak typist, I've witnessed the confusion on people's faces when they discover "asdf" on my keyboard produces "aoeu" on the screen. You may also have pressed special combinations of keys to produce characters, such as ™ or ß or ≠, that's not even printed on your keyboard.
Each letter or character, whether it's printed on your keyboard or not, has a code. Character encoding can be expressed in different ways, but to a computer the Unicode sequences u2018 and u2019 produce ‘ and ’, while the codes u201c and u201d produce the “ and ” characters. Knowing these "secret" codes means you can replace them programmatically using a command like sed. Any version of sed will do, so you can use GNU sed or BSD sed or even Busybox sed.
Here's the simple shell script I use:
#!/bin/sh
# GNU All-Permissive License
SED=$(which sed)
SDQUO=$(echo -ne '\u2018\u2019')
RDQUO=$(echo -ne '\u201C\u201D')
$SED -i -e "s/[$SDQUO]/\'/g" -e "s/[$RDQUO]/\"/g" "${1}"
Save this script as fixquotes.sh
and then create a separate test file containing smart quotes:
‘Single quote’
“Double quote”
Run the script, and then use the cat command to see the results:
$ sh ./fixquotes.sh test.txt
$ cat test.txt
'Single quote'
"Double quote"
Install sed
If you’re using Linux, BSD, or macOS, then you already have GNU or BSD sed
installed. These are two unique reimplementations of the original sed
command, and for the script in this article they are functionally the same (that's not true for all scripts, though).
On Windows, you can install GNU sed with Chocolatey.
3 Comments