How Visual and Voice Search Are Revitalising The Role of SEO

How Visual and Voice Search Are Revitalising The Role of SEO

The component parts of a successful search engine optimization (SEO) strategy may have remained relatively constant, but their definition and purpose have changed entirely. Driven by trends like visual search and voice search, the industry’s scope has expanded and evolved into something more dynamic.

This delivers on a genuine consumer need. According to a report from, 74 percent of shoppers report that text-only search is insufficient for finding the products they want.

It is unsurprising that Gartner research predicts that by 2021, early adopter brands that redesign their websites to support visual and voice search will increase digital commerce revenue by as much as 30 percent. Through visual and voice search, marketers can engage more meaningfully with their audience at each stage of their purchase journey. This means moving beyond the static websites of old toward more interactive experiences that can be accessed anywhere, any time, on any device.

Sensory search

Search visibility still matters, but the concept of “rankings” is hard to pin down when we factor in the proliferation of the Internet of Things (IoT) devices and the machine learning algorithms that fine-tune the search results.

Brands’ content must be relevant to a query, but those queries are getting more specific and contextual; relevance must be combined with usefulness at the moment.

Underpinning these shifts are two trends that are revitalizing the search industry: visual search and voice search. Though these are linked and can be grouped under the umbrella of “sensory search,” they are separate disciplines with different implications for search marketers.

For those that engage early by implementing technical best practices and adapting SEO strategies, they represent some of the foremost opportunities in digital marketing for the coming years.

Visual search

For many years, Google has provided the ability to upload an image or image URL to generate a search engine result page (SERP) in the search toolbar within Google Images.

The next generation of visual search turns a smartphone camera into a visual discovery tool. It can use an image as a search query, which allows consumers to search for styles and objects that they would otherwise struggle to define. The most popular visual search technologies are Google Lens and Pinterest Lens, but Amazon, Bing and a growing list of major retailers are all investing heavily in this area. Visual search is also a building block for augmented reality and virtual reality interactions.

There is a growing swell of evidence to substantiate the claims this technology is taking off with consumers, too:

According to Grand View Research:

The global image recognition market size was valued at USD 16.0 billion in 2016 and is likely to expand at a compound annual growth rate of 19.2 percent from 2017 to 2025.

This is still a sizeable opportunity for retailers, too, as only 32 percent are either already using artificial intelligence (AI) for visual search or plan to do so within the next year:

Visual search optimization tips

Here are some tips to help you optimize for visual search:

  • Add multiple images to each product or topic page.
  • Optimize the images for the web and swift page load.
  • Consider adding raster images and add message and call to action (CTA) in the photo so it is more compelling when viewed in Google Images or repurposed.
  • Upload image eXtensible markup language (XML) sitemaps and ensure that product inventory is updated across all search engines and retailers.
  • Maintain a logical site hierarchy that is connected through relevant internal links.
  • Make sure your images are hosted on authoritative pages that respond to a specific user intent.
  • Map keyword categories and themes to your images, and then use these queries to optimize image alt tags, titles and captions. Put relevant keywords in the image file name.
  • Develop a unique brand aesthetic across all visual assets. This will help search engines relate your brand to a particular style.
  • If you use a stock image, tailor them to ensure they are not identical to the hundreds of other instances of that exact image. Search engines will find it difficult to understand your image if it is replicated across the web in different contexts.
  • Although visual search reporting is still very limited, keep a close eye on your image search traffic to keep track of any increases in demand.

Voice search

Voice search has had much more publicity than its visual counterpart, fronted by glitzy demonstrations from the likes of Apple, Google and Amazon. In the “age of assistance,” it seems voice will be the preferred mode of access to AI-driven devices. Undoubtedly, some impressive statistics substantiate this claim:

Sixty-five percent of people who own an Amazon Echo or Google Home can’t imagine going back to the days before they had a smart speaker.

Voice commerce sales reached $1.8 billion in the US last year and are predicted to reach $40 billion by 2022.

Fifty-two percent of voice-activated speaker owners would like to receive information about deals, sales and promotions from brands.

These are still experimental times for voice search, and many brands are trying to ascertain just how much it will affect their industry. As with visual search, reporting is limited at the moment, but there are still plenty of opportunities for innovation. Brands need to think about how they want to sound, rather than just look. Voice search naturally opens up conversations, and it is certainly possible to foresee a future where digital assistants relay messages directly from brands, rather than just reading the text.

A step in this direction is the launch of the Speakable structured data format, now available in beta via Although it’s only available for news at the moment, it will surely open up to other industries after this test period.

Voice search optimization tips

Google’s guidelines point out some important points for any brand that wishes to optimize for voice search:

  • Content indicated by speakable structured data should have concise headlines and/or summaries that provide users with comprehensible and useful information.
  • If you include the top of the story in speakable structured data, we suggest that you rewrite the top of the story to break up information into individual sentences so that it reads more clearly for text to speech (TTS).
  • For optimal audio user experiences, we recommend around 20 to 30 seconds of content per section of speakable structured data, or roughly two to three sentences.

The concept of a ‘“brand voice” looks set to take on a very literal dimension as voice search evolves into something more conversational.

Technical SEO for visual and voice search

If brands can’t predict the variety and volume of demand with precision, they must ensure they are in prime position to attract qualified traffic.

As we move into an era of ambient search, with consumers looking for instant information on the go, it is imperative that content can be served quickly and seamlessly. One technical consideration is that a higher quantity of pre-rendered content needs to be served to the user and to search engines. This is more important than in the past, when a significant amount of processing could occur within the browser.

However, to respond to (and even pre-empt) user queries via voice or image, pre-rendered content should be delivered to search engine user agents. Structured data is often mentioned in relation to visual and voice search, with good reason. The premise of semantic search, which is an essential development for visual and voice search, is built on the idea of entities and structure. By understanding entities and how they are interconnected, a search engine can infer context and intent from search queries.

For visual search, Google’s Clay Bavor summarized the size of the challenge:

In the English language, there’s something like 180,000 words, and we only use 3,000 to 5,000 of them. If you’re trying to do voice recognition, there’s a really small set of things you actually need to be able to recognize. Think about how many objects there are in the world, distinct objects, billions, and they all come in different shapes and sizes.

Brands need to help Googlebot by structuring and labeling their own data so that it can be served instantly for relevant queries.

There are some vital structured data elements that brands should focus on for visual and voice search (if applicable):

  • Price.
  • Availability.
  • Product name.
  • Image.
  • Logos.
  • Social profiles.
  • Breadcrumb navigation.


Visual and voice search are taking hold for a host of intertwined reasons, both psychological and technological. They allow users to find new ideas in more effective and efficient formats. They also intersect with numerous technological trends, including digital assistants, artificial intelligence, and vertical search.

In the case of vertical search, the discovery of content within specific verticals is a natural fit for targeted information retrieval.

One of the prime benefits of both visual and voice search is that they simply create a platform for more effective communication with consumers. As the role of search expands to cover every step on the path to purchase, the number of search-based micro-moments will continue to proliferate. To capitalize, brands need a deep understanding of their consumers, a multimedia content strategy that caters to their audience’s requirements and the technical knowledge to communicate these messages to search engines through text, voice and images.

The future of search lies with voice, visual and vertical optimization. While that may sound disconcertingly nebulous, savvy marketers are defining what this new order means to them and acting to implement their strategies today.

Opinions expressed in this article are those of the guest authors. Staff authors are listed here.