Web Crawlers, Google, & SEO in Modern Website Design

published on: October 30th 2019

Nothing is more important than SEO on a site. Having great SEO is a complex and exhaustive challenge. The first hurtle, and most important, is ensuring your site available for Google's web crawlers to access and assess your site. Having a sitemap file, a robot file, and exposed HTML links are all vital to that process going well.

The benefits of a Single Page Application, especially when incorporating a reusable UI library like React, are widely known. Building, reusing, and updating page components is a cornerstone of modern web development. A fact not so widely known is that a typical React SPA can be abhorrent at SEO. Why is this?

While a typical Googlebot-html interaction is straightforward. The bot crawls and gathers the site's content; then sends the information gathered to the indexer. With a SPA there is a problem. When the crawler comes across the JavaScript coded links it does not recognize them as internal links and forgoes crawling them for content. However, It does send this strange data to the indexer, which DOES know to parse and execute the JavaScript masking the links. The indexer then sends the links back to the crawler to - crawl the links and discover the pages and content. Here is where the problem lies. The crawler could be gone.

We solve this major issue, thus keeping the benefits of React, by delivering your entire site as static on initial request from a user's device. Meaning, every internal link, external link, word, keyword, phrase, image...everything is available for Google to easily index as plain old HTML . If the content is not available on your page, when the crawler comes a crawling, Google and the world will not know about it.

What exactly is a web crawler? Google describes it as such...

"The web is like an ever-growing library with billions of books and no central filing system. We use software known as web crawlers to discover publicly available webpages. Crawlers look at webpages and follow links on those pages, much like you would if you were browsing content on the web. They go from link to link and bring data about those webpages back to Google’s servers."

This next part is very important.

"The software pays special attention to new sites, changes to existing sites and dead links. Computer programs determine which sites to crawl, how often and how many pages to fetch from each site."

Meaning - a web crawler may only check 1 page or it may check all of them, it may come once a week or everyday, so having everything ready and rendered at all times is key. Here at Whipstitch we do that by default.