5 Things You Can Do With DocumentAlchemy

4. Extract images from a PowerPoint deck.

Let's assume your company is creating a “media asset” library as part of a new content management initiative.

You have a bunch of PowerPoint presentations that employees have created over the past decade, and many of them have interesting charts, photos and diagrams that you'd like to break out as individual “assets” for the library.

How do you get at all of those images without opening each deck and copying each out manually?

Well, one option is to take advantage of Microsoft's “export as HTML” feature in PowerPoint. The process is a little different in different releases of PowerPoint, but the general idea is to (1) open the presentation, (2) use “Save As...” or similar to save the presentation as a “Web Page” or “Single File Web Page” then (3) go to the folder in which PowerPoint saved the HTML and copy all of the images to your central location, (4) delete the rest of the export and repeat the process for the next deck.

This process is tedious at best, and can be worse than that. The option to “Save as Web Page” is not even listed in Office 2010, but you can emulate the feature with a little bit of custom scripting.

DocumentAlchemy can help you extract images from PowerPoint (and other MS Office documents, as well as PDF) via REST API calls or from the command line.

For example, extract all of the images from a PowerPoint presentation named MY-DECK.PPTX with the following curl command:

(That method works with both PPTX and PPT files. For that matter, that method will work with any MS Office or PDF file.)

Here's a shell-script that will perform that action for all files listed on the command line (including support for “glomming” patterns like *.pptx).

Save the script as extract-images.sh, make sure it is executable (chmod a+x extract-images.sh) and run it like this:

./extract-images.sh *.ppt *.pptx

to extract all of the images embedded in a set of Microsoft Office files.

EDIT: See “Extract images from a PowerPoint deck.” in “Five Things You Can Do With the DocumentAlchemy API - Command-Line Interface Edition” for an even easier interface to this functionality.

