Four ML models that accelerate content creation

I love flashy ML demos. Seeing a mind-blowing model produce novel results fills me with wonder and curiosity. While machine learning as a field produces the “wow factor” on a regular basis, a big challenge is going from an impressive demo to a useful product. Many models promise broad value, but today I’m going to think through four specific tasks in my content creation process and the models that could accelerate that work and improve its outcomes.

As a technical writer, I create all sorts of content. ML has promising applications across audio and video, but let’s focus on written articles like the one you’re reading right now. Writing this article is a multi-step process: I develop a topic and outline, research and structure my thoughts, draft and revise the piece, add graphics and images, and finally format and distribute the finished work. I want ML to help me:

  • Create great content with less effort

  • Create things I couldn’t otherwise make

  • Reach a broader audience

As with any tool, I need to make sure ML tools are actually helpful. I don’t want to produce high volumes of stuff people don’t want or let AI introduce errors into my work. Even without ML tools, the sheer scale of the internet means that staggering amounts of content is produced every day. AI tools are great at producing output, but it’s still up to me to use my taste to curate what I add to my work.

Research: Speech to text models

My favorite type of source is an interview with a subject-matter expert. However, when I’m creating my outline, I like to have the interview as text so that I can skim it quickly and copy quotes. While I’m a competent typist, I can only transcribe fifteen to twenty minutes of audio per hour of focused work, which makes transcription the most time-consuming single step of many projects.

Even the best speech-to-text models and services make plenty of errors. But for my own use in the drafting process, automatic transcription has gotten good enough. A pass to clean up issues is a great deal faster than manual transcription, even for transcripts that will be published in their entirety. Automatic transcription has the cleanest premise of any of these ML models, it simply saves time.

Drafting: Next word prediction models

Much has been made of GPT-3’s ability to create coherent, relevant text. But I believe that the model’s best applications lie outside of creating long-form text. I only write an article when I have something to say, and call me old-fashioned, but I like to be the one saying it. Still, I use a humbler form of text generation in my daily work: next-word prediction and autocomplete.

In Google docs, where I am drafting this article, there’s an advanced autocomplete feature that occasionally kicks in and suggests several words to complete a phrase. Beyond saving a few keystrokes on long words, this feature actually guesses the next few words I intended to type with surprising accuracy.

Returning to curation, I have a perhaps uncommon use for these next-word suggestions. If I’m typing a sentence and the next few words I planned to say pop up, I take that as a sign that I need to use more creative syntax. If the phrase is commonplace enough that the model predicts it, I’m challenged to elevate my writing. This adversarial use of a ML tool saves my editors and my readers from the clichés that I am most prone to using.

Illustrating: Image generation models

Readers bounce off walls of text like a gymnast from a trampoline. Blog posts employ headers, lists, and other structural text elements to make content more skimmable and engaging, but nothing catches a casual skimmer’s eye quite like a graphic.

A graphic with a cat on it, also

I am no illustrator or artist, and my graphic design skills extend as far as making minor tweaks to established templates in Figma. Image generation models like OpenAI’s Dall·E 2 and Google’s Imagen represent a near future where I no longer have to rely on the same dozen stock photos that everybody uses to quickly and inexpensively adorn my work. Instead, I’ll be able to generate original, relevant images that enhance reader engagement and understanding.

Distribution: Natural language translation models

I speak Spanish well enough to order at a restaurant or get directions across a city, but I’d have a lot of trouble learning a technical topic from an article written in Spanish. Many technologists learn English as a second, third, or Nth language, but freely available translation tools like Google Translate improve in translation quality year after year. ML-based internationalization is already here, and it’s only getting better.

As a writer, I want to reach as many people as possible with my work, especially when I’m writing professionally. While the grammatical accuracy, context preservation, and overall fluidity of machine-translated writing is still far from ideal, I look forward to one day knowing that anything I write can be read by anyone in the world, and that I can read what they create.

Conclusion on curation

Of the four types of models discussed here, I’m somewhat skeptical of the capabilities of text generation while enthusiastic for the contributions that the other models make. I’m a professional writer, so I see the most shortcomings in the model that deals with writing. I’m sure a professional transcriber, illustrator, or translator would find equally nuanced faults in the models that mimic their work. Despite their limitations, ML tools are a compelling compliment to a human’s taste and curation, and can help writers create better content more quickly and for more people.