5 Things - The DocumentAlchemy CLI Edition

the DocumentAlchemy blog

 
5 Things - The DocumentAlchemy CLI Edition

Five Things You Can Do With the DocumentAlchemy API - Command-Line Interface Edition

Last month we blogged about “Five Things You Can Do With the DocumentAlchemy API”, and provided source code examples for doing each of them.

Today we'll revisit those tasks and demonstrate how to complete them using the DocumentAlchemy Command-Line Interface—an open-source program that makes it easy to leverage the document processing API from the command-line.

1. Upgrade your old *.doc files to *.docx.

Beginning with Office 2007, Microsoft changed the format for Word documents from the proprietary .doc binary format to a standards-based .docx XML format.

Since DocumentAlchemy understands both the .doc and .docx formats, you can use the API to “upgrade” files from the old (binary) format to the new (zipped-XML) format.

To convert a single file .doc file into a .docx file

, use the following command:

document-alchemy convert MY-DOCUMENT.doc --to docx --out MY-DOCUMENT.docx

To convert many .doc files into .docx files (in a batch)

, you can use a shell-script like the following:

#!/bin/bash
# Converts one or more *.doc files into *.docx

# USAGE: doc2docx <DOC_FILE>
#        doc2docx <DIRECTORY>/*.doc
# Errors are reported, but do not stop further processing.
# Exits with `0` if the given files were converted correctly,
# or with the number of failing files.

# This is a flag that can prevent this script from `echo`ing
# unnecessary information.  It can be set via the environment
# variable `QUIET`.
# A clever person could make this into a command line parameter
# like `-q`.
QUIET=${QUIET:-FALSE}

# EXIT_CODE tracks the number of input documents we couldn't convert.
EXIT_CODE=0

# Loop over the command line parameters....
for doc in "$@"; do
  # ...testing that is is an accessible file...
  if ! [ -s "$doc" ]; then
    $QUIET || echo "WARNING: File '$doc' was not found and will be ignored."
    EXIT_CODE=$((EXIT_CODE+1))
  else
    # ...and that the filename ends with `.doc`...
    docx="`echo "$doc" | sed 's/\.doc$/\.docx/'`"
    if [[ "$doc" == "$docx" ]]; then
      $QUIET || echo "WARNING File '$doc' does not end in '.doc' and will be ignored."
      EXIT_CODE=$((EXIT_CODE+1))
    else
      # ...if so, POST to DocumentAlchemy to convert the file...
      $QUIET || echo "Converting '$doc' into '`basename "$docx"`'...";
      document-alchemy convert "$doc" --to docx --out "$docx"".docx"
      # ...and report success or failure.
      if ! [ "$?" -eq "0" ]; then
        $QUIET || echo "WARNING: Encountered a non-zero exit code for file '$doc'. Found $?.";
        EXIT_CODE=$((EXIT_CODE+1))
      else
        $QUIET || echo "...OK. File '$docx' created."
      fi
    fi
  fi
done

# Exit with the number of documents that could not be converted.
exit $EXIT_CODE

Save the script as doc2docx.sh, make sure it is executable (chmod a+x doc2docx.sh) and run it like this:

./doc2docx.sh MyDocuments/*.doc

For every .doc file enumerated on the command line, a sibling .docx file will be created (or over-written!) containing the DOCX equivalent.

Also see our previous post for scripts that use curl instead of the DocumentAlchemy CLI.


2. Render Markdown as HTML, PDF or Microsoft Word.

Markdown is a handy plain-text format for authoring rich-text documents, relying on pre-existing conventions for styling text in plain-text, like _underscores_ for emphasis and **stars** for bold.

Markdown was designed to be converted to HTML, but it is also commonly rendered in other rich-text formats.

DocumentAlchemy, and the document-alchemy command-line application can render Markdown as Microsoft Word, HTML or PDF documents.

To convert a Markdown file into an MS Word file

, use the following command:

document-alchemy convert MY-DOC.md --to docx --out MY-DOC.docx

The command for converting Markdown to HTML and Markdown to PDF are similar:

document-alchemy convert MY-DOC.md --to html --out MY-DOC.html
document-alchemy convert MY-DOC.md --to pdf --out MY-DOC.pdf

Unlike our previous post, now there is no need for a script that makes it easier to execute the conversion, the DocumentAlchemy CLI is at least as easy as the previous shell-script, right out of the box.


3. Resize images to a fixed width.

To display a set of irregularly-sized images in a grid or other orderly layout, it is helpful to resize them to all have the same width, even if they have variable height.

DocumentAlchemy, and the document-alchemy command-line application can do that for you.

To resize a single image to a fixed-width of 240 pixels

, use the following command:

document-alchemy transform IMAGE.jpg resize 240 0 > IMAGE-240.jpg

To resize a many images to the same fixed-width

, you can use a script like the following:

#!/bin/bash
# Resizes one or more images to a fixed width using the DocumentAlchemy API

# USAGE: resize.sh <NEW-WIDTH> <FILES>
#
# EXAMPLE: resize.sh 180 images/*.png

# The generated files will have the same name (and location) as their source
# image, but with '-resized' inserted between the filename and the extension.

# This is a flag that can prevent this script from `echo`ing
# unnecessary information.  It can be set via the environment
# variable `QUIET`. A clever person could make this into a
# command line parameter like `-q`.
QUIET=${QUIET:-FALSE}

# utility function to show simple help
function usage {
  echo "USE: $0 <SIZE> <IMAGES>";
}

# EXIT_CODE tracks the number of IMAGES we couldn't resize.
EXIT_CODE=0

# The first argument contains the desired size:
NEW_WIDTH=$1

# The rest will be filenames, so "shift" the first argument away and prcess the rest.
shift;

# Loop over the command line parameters....
for img in "$@"; do
  # ...testing that is is an accessible file...
  if ! [ -s "$img" ]; then
    $QUIET || echo "WARNING: File '$img' was not found and will be ignored."
    EXIT_CODE=$((EXIT_CODE+1))
  else
    # ...if so, POST to DocumentAlchemy to resize the image...
    outfile="`echo "${img}" | sed -E 's/\.[a-z]+$/-resized.png/'`";
    $QUIET || echo "Resizing '$img' into '`basename "$outfile"`'...";
    document-alchemy transform "$img" resize $NEW_WIDTH 0 -o "$outfile"
    if ! [ "$?" -eq "0" ]; then
      $QUIET || echo "WARNING: Encountered a non-zero exit code. Found $?.";
      EXIT_CODE=$((EXIT_CODE+1))
    else
      $QUIET || echo "...OK. File '$outfile' created."
    fi
  fi
done

# Exit with the number of documents that could not be converted.
exit $EXIT_CODE

Save the script as resize-image.sh, make sure it is executable (chmod a+x resize-image.sh) and run it like this:

./resize-image.sh 180 foo.png

to generate a rendition of foo.png that is at most 180 characters wide, or:

./resize-image.sh 240 *.png dir-one/*.png dir-two/*.png

to generate a 240-pixel-wide rendition of each PNG image found in the specified directories.

Also see our previous post for scripts that use curl instead of the DocumentAlchemy CLI.


4. Extract images from a PowerPoint deck.

DocumentAlchemy, and the document-alchemy command-line application offers a way to get at the images embedded with a Microsoft Office or PDF document. For example:

To extract all images from a single MS Office or PDF document

, you can use a command like the following:

document-alchemy convert MY-DECK.pptx --to images.zip --out IMAGES-FROM-MY-DECK.zip

To extract the images from many Office or PDF documents at once

, you can use a script like the following:

#!/bin/bash
# Extracts images from Microsoft Office or PDF files using the DocumentAlchemy API.

# USAGE: extract-images.sh <FILES>
#
# EXAMPLE: extract-images.sh MyDeck.pptx *.doc

# A ZIP archive containing the extracted images (if any) will be
# created for each document submitted.

# This is a flag that can prevent this script from `echo`ing
# unnecessary information.  It can be set via the environment
# variable `QUIET`. A clever person could make this into a
# command line parameter like `-q`.
QUIET=${QUIET:-FALSE}

# EXIT_CODE tracks the number of files we couldn't extract from.
EXIT_CODE=0

# Loop over the command line parameters....
for doc in "$@"; do
  # ...testing that is is an accessible file...
  if ! [ -s "$doc" ]; then
    $QUIET || echo "WARNING: File '$doc' was not found and will be ignored."
    EXIT_CODE=$((EXIT_CODE+1))
  else
    # ...if so, POST to DocumentAlchemy to extract the images...
    outfile="`dirname "$doc"`/images-from-`basename "$doc"`.zip";
    $QUIET || echo "Extracting images from '$doc' into '`basename "$outfile"`'...";
    document-alchemy convert "$doc" --to images.zip --out "$outfile"
    # ...and report success or failure.
    if ! [ "$?" -eq "0" ]; then
      $QUIET || echo "WARNING: Encountered a non-zero exit code. Found $?.";
      EXIT_CODE=$((EXIT_CODE+1))
    else
      $QUIET || echo "...OK. File '$outfile' created."
    fi
  fi
done

# Exit with the number of documents that could not be converted.
exit $EXIT_CODE

Save the script as extract-images.sh, make sure it is executable (chmod a+x extract-images.sh) and run it like this:

./extract-images.sh *.ppt *.pptx

to extract all of the images embedded in a set of Microsoft Office files.

Also see our previous post for scripts that use curl instead of the DocumentAlchemy CLI.


5. Add a cover page to a PDF document.

DocumentAlchemy, and the document-alchemy command-line application, can combine two or more Microsoft Office (Word, Excel or PowerPoint) and PDF documents into a single PDF. For example:

To add a cover page to a single PDF document

, or more generally to combine two PDF or MS Office documents, you can use a command like the following:

document-alchemy join COVER.pdf BODY.pdf > COVERED-BODY.pdf

To add a cover page to many PDF documents at once

, you can use a script like this:

#!/bin/bash
# Adds a cover page to one or more PDF documents.

# USAGE: add-cover.sh <COVER-PAGE> <FILES>
#
# EXAMPLE: add-cover.sh cover.pdf doc/*.pdf

# This is a flag that can prevent this script from `echo`ing
# unnecessary information.  It can be set via the environment
# variable `QUIET`. A clever person could make this into a
# command line parameter like `-q`.
QUIET=${QUIET:-FALSE}

# EXIT_CODE tracks the number of files we couldn't extract from.
EXIT_CODE=0


# The first argument contains the cover page.
COVER_PAGE=$1
# The rest will be filenames, so "shift" the first argument away and process the rest.
shift;

# Loop over the command line parameters....
for pdf in "$@"; do
  # ...testing that is is an accessible file...
  if ! [ -s "$pdf" ]; then
    $QUIET || echo "WARNING: File '$pdf' was not found and will be ignored."
    EXIT_CODE=$((EXIT_CODE+1))
  else
    # ...if so, POST to DocumentAlchemy to join the documents
    outfile="`dirname "$pdf"`/covered-`basename "$pdf"`"
    $QUIET || echo "Adding cover page '$COVER_PAGE' to '$pdf'...";
    document-alchemy join "$COVER_PAGE" "$pdf" --out "$outfile"
    # ...and report success or failure.
    if ! [ "$?" -eq "0" ]; then
      $QUIET || echo "WARNING: Encountered a non-zero exit code. Found $?.";
      EXIT_CODE=$((EXIT_CODE+1))
    else
      $QUIET || echo "...OK. File '$outfile' created."
    fi
  fi
done

# Exit with the number of documents that could not be joined.
exit $EXIT_CODE

Save the script as add-cover.sh, make sure it is executable (chmod a+x add-cover.sh) and run it like this:

./add-cover.sh cover-page.pdf documents/*.pdf

to create a copy of each PDF file in documents with cover-page.pdf added at the beginning of each.

Note that the generated files will be found in the same directory as the source file, with the string covered- prefixed to the file name.

Also see our previous post for scripts that use curl instead of the DocumentAlchemy CLI.


About DocumentAlchemy

DocumentAlchemy is a RESTful web service that developers can use to add document generation, conversion and processing features to their app.

Visit our live demonstrations or interactive API reference to explore Document Alchemy's rapidly growing API for working with MS Office documents, PDFs, images, web pages and more.


Copyright © 2017 DocumentAlchemy.