White paper on Optimize web for search engines

COURTESY :- vrindawan.in

Wikipedia

Search engine optimization (SEO) is the process of making websites appear as a top result when someone searches for a specific term. The search term is called a Keyword. The results of the search are displayed in search engine’s results pages (SERPS) such as those given by Google. A more common word used to describe search engine optimization is internet marketing or search engine marketing.

On-page SEO, or commonly termed as “on-site SEO,” refers to the practice of optimizing web pages and content for the users and search engines to boost ranking and generate website traffic. In other words, it refers to the elements and attributes inside of a website, such as content, HTML, system, CSS, and internal links.

On-page SEO is still a predominant strategy practiced by web owners and web development agencies because of its essence in ranking website through content and boost online visibility through keywords in which Google uses to index websites and rank accordingly. In fact, Google’s own “How Search Works” reported that, “The most basic signal that information is relevant is when a web page contains the same keyword as the search query. If those keywords appear on the page, or if they appear in the headings or body of the text, the information is more likely to be relevant.”

Off-page SEO refers to the elements and attributes outside of a website that increases online engagement and vital in attracting potential customers. It focuses mainly on link building strategy, social media integration, and local SEO.

Off-page is the complete opposite of on-page SEO, but it also has the same level of relevance – one cannot work without the other. It’s one of the best ways to provide a relevant result to the users and earn back links from another website to boost credibility. Since the focus of off-page SEO is the outside activities of a website, it’s expected that one’s ranking will increase as well as the page rank. It also utilizes the presence of social media platforms that can effectively increase engagement and exposure that is a great way to establish brand awareness and trustworthiness.

Technical SEO refers to the non-content elements of your elements on you site. This applies to elements that affect the optimization and proper operation of the website, such as Site speed, Mobile-friendliness, Crawl ability.

Technical SEO is the primary strategy used by website development agencies. Correct and fast operation of the website is a key element that has a fundamental impact on the reception of the website.

SEO has become an important part of marketing campaigns run by businesses to find customers online because 65% of the people who search for something don’t look beyond the 5th result of the search engine’s results page. Many companies use SEO companies to do the work for them because it takes too much time to do themselves.

Search engine companies like Google, Yahoo, Bing are careful to watch that spam does not affect their search engine results pages by adding filters such as Google’s Page Rank. These filters are known as algorithms.

Most people do not need to pay for search engine optimization services because usually a website is supposed to show up on the search engines over time anyway. But some online shopping websites may want to show up sooner, and so they pay for search engine optimization from companies that provide it.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic (known as “natural” or “organic” results) rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

As an Internet marketing strategy, SEO considers how search engines work, the computer-programmed algorithms that dictate search engine behavior, what people search for, the actual search terms or keywords typed into search engines, and which search engines are preferred by their targeted audience. SEO is performed because a website will receive more visitors from a search engine when websites rank higher on the search engine results page (SERP). These visitors can then potentially be converted into customers.

Web masters and content providers began optimizing websites for search engines in the mid-1990s, as the first search engines were cataloging the early Web. Initially, all webmasters only needed to submit the address of a page, or URL, to the various engines, which would send a web crawler to crawl that page, extract links to other pages from it, and return information found on the page to be indexed. The process involves a search engine spider downloading a page and storing it on the search engine’s own server. A second program, known as an indexer, extracts information about the page, such as the words it contains, where they are located, and any weight for specific words, as well as all links the page contains. All of this information is then placed into a scheduler for crawling at a later date.

Website owners recognized the value of a high ranking and visibility in search engine results, creating an opportunity for both white hat and black hat SEO practitioners. According to industry analyst Danny Sullivan, the phrase “search engine optimization” probably came into use in 1997. Sullivan credits Bruce Clay as one of the first people to popularize the term.

Early versions of search algorithms relied on webmaster-provided information such as the keyword meta tag or index files in engines like ALIWEB. Meta tags provide a guide to each page’s content. Using metadata to index pages was found to be less than reliable, however, because the webmaster’s choice of keywords in the meta tag could potentially be an inaccurate representation of the site’s actual content. Flawed data in meta tags, such as those that were not accurate, complete, or falsely attributes, created the potential for pages to be miss characterized in irrelevant searches. Web content providers also manipulated some attributes within the HTML source of a page in an attempt to rank well in search engines. By 1997, search engine designers recognized that webmasters were making efforts to rank well in their search engine and that some webmasters were even manipulating their rankings in search results by stuffing pages with excessive or irrelevant keywords. Early search engines, such as Altavista and Info seek, adjusted their algorithms to prevent webmasters from manipulating rankings.

By heavily relying on factors such as keyword density, which were exclusively within a webmaster’s control, early search engines suffered from abuse and ranking manipulation. To provide better results to their users, search engines had to adapt to ensure their results pages showed the most relevant search results, rather than unrelated pages stuffed with numerous keywords by unscrupulous webmasters. This meant moving away from heavy reliance on term density to a more holistic process for scoring semantic signals. Since the success and popularity of a search engine are determined by its ability to produce the most relevant results to any given search, poor quality or irrelevant search results could lead users to find other search sources. Search engines responded by developing more complex ranking algorithms, taking into account additional factors that were more difficult for webmasters to manipulate.

Companies that employ overly aggressive techniques can get their client websites banned from the search results. In 2005, the Wall Street Journal reported on a company, Traffic Power, which allegedly used high-risk techniques and failed to disclose those risks to its clients. Wired magazine reported that the same company sued blogger and SEO Aaron Wall for writing about the ban. Google’s Matt Cutts later confirmed that Google did in fact ban Traffic Power and some of its clients.

Some search engines have also reached out to the SEO industry and are frequent sponsors and guests at SEO conferences, webchats, and seminars. Major search engines provide information and guidelines to help with website optimization. Google has a Sitemaps program to help webmasters learn if Google is having any problems indexing their website and also provides data on Google traffic to the website. Bing Webmaster Tools provides a way for webmasters to submit a sitemap and web feeds, allows users to determine the “crawl rate,” and track the web pages index status.

In 2015, it was reported that Google was developing and promoting mobile search as a key feature within future products. In response, many brands began to take a different approach to their Internet marketing strategies.

In 1998, two graduate students at Stanford University, Larry Page and Sergey Brin, developed “Backrub,” a search engine that relied on a mathematical algorithm to rate the prominence of web pages. The number calculated by the algorithm, PageRank, is a function of the quantity and strength of inbound links. Page Rank estimates the likelihood that a given page will be reached by a web user who randomly surfs the web and follows links from one page to another. In effect, this means that some links are stronger than others, as a higher PageRank page is more likely to be reached by the random web surfer.

Page and Brin founded Google in 1998. Google attracted a loyal following among the growing number of Internet users, who liked its simple design. Off-page factors (such as PageRank and hyperlink analysis) were considered as well as on-page factors (such as keyword frequency, meta tags, headings, links and site structure) to enable Google to avoid the kind of manipulation seen in search engines that only considered on-page factors for their rankings. Although Page Rank was more difficult to game, webmasters had already developed link-building tools and schemes to influence the Inktomi search engine, and these methods proved similarly applicable to gaming Page Rank. Many sites focus on exchanging, buying, and selling links, often on a massive scale. Some of these schemes, or link farms, involved the creation of thousands of sites for the sole purpose of link spamming.

By 2004, search engines had incorporated a wide range of undisclosed factors in their ranking algorithms to reduce the impact of link manipulation. In June 2007, The New York Times’ Saul Hansell stated Google ranks sites using more than 200 different signals. The leading search engines, Google, Bing, and Yahoo, do not disclose the algorithms they use to rank pages. Some SEO practitioners have studied different approaches to search engine optimization and have shared their personal opinions. Patents related to search engines can provide information to better understand search engines. In 2005, Google began personalizing search results for each user. Depending on their history of previous searches, Google crafted results for logged in users.

In 2007, Google announced a campaign against paid links that transfer PageRank. On June 15, 2009, Google disclosed that they had taken measures to mitigate the effects of PageRank sculpting by use of the nofollow attribute on links. Matt Cutts, a well-known software engineer at Google, announced that Google Bot would no longer treat any no follow links, in the same way, to prevent SEO service providers from using nofollow for PageRank sculpting. As a result of this change, the usage of nofollow led to evaporation of Page Rank. In order to avoid the above, SEO engineers developed alternative techniques that replace nofollowed tags with obfuscated JavaScript and thus permit Page Rank sculpting. Additionally, several solutions have been suggested that include the usage of iframes, Flash, and JavaScript.

In December 2009, Google announced it would be using the web search history of all its users in order to populate search results. On June 8, 2010 a new web indexing system called Google Caffeine was announced. Designed to allow users to find news results, forum posts, and other content much sooner after publishing than before, Google Caffeine was a change to the way Google updated its index in order to make things show up quicker on Google than before. According to Carrie Grimes, the software engineer who announced Caffeine for Google, “Caffeine provides 50 percent fresher results for web searches than our last index. Google Instant, real-time-search, was introduced in late 2010 in an attempt to make search results more timely and relevant. Historically site administrators have spent months or even years optimizing a website to increase search rankings. With the growth in popularity of social media sites and blogs, the leading engines made changes to their algorithms to allow fresh content to rank quickly within the search results.

In February 2011, Google announced the Panda update, which penalizes websites containing content duplicated from other websites and sources. Historically websites have copied content from one another and benefited in search engine rankings by engaging in this practice. However, Google implemented a new system that punishes sites whose content is not unique. The 2012 Google Penguin attempted to penalize websites that used manipulative techniques to improve their rankings on the search engine. Although Google Penguin has been presented as an algorithm aimed at fighting web spam, it really focuses on spammy links by gauging the quality of the sites the links are coming from. The 2013 Google Hummingbird update featured an algorithm change designed to improve Google’s natural language processing and semantic understanding of web pages. Hummingbird’s language processing system falls under the newly recognized term of “conversational search,” where the system pays more attention to each word in the query in order to better match the pages to the meaning of the query rather than a few words. With regards to the changes made to search engine optimization, for content publishers and writers, Hummingbird is intended to resolve issues by getting rid of irrelevant content and spam, allowing Google to produce high-quality content and rely on them to be ‘trusted’ authors.

In October 2019, Google announced they would start applying BERT models for English language search queries in the US. Bidirectional Encoder Representations from Transformers (BERT) was another attempt by Google to improve their natural language processing, but this time in order to better understand the search queries of their users. In terms of search engine optimization, BERT intended to connect users more easily to relevant content and increase the quality of traffic coming to websites that are ranking in the Search Engine Results Page.

The leading search engines, such as Google, Bing, and Yahoo!, use crawlers to find pages for their algorithmic search results. Pages that are linked from other search engine-indexed pages do not need to be submitted because they are found automatically. The Yahoo! Directory and DMOZ, two major directories which closed in 2014 and 2017 respectively, both required manual submission and human editorial review. Google offers Google Search Console, for which an XML Sitemap feed can be created and submitted for free to ensure that all pages are found, especially pages that are not discoverable by automatically following links in addition to their URL submission console. Yahoo! formerly operated a paid submission service that guaranteed to crawl for a cost per click; however, this practice was discontinued in 2009.

Search engine crawlers may look at a number of different factors when crawling a site. Not every page is indexed by search engines. The distance of pages from the root directory of a site may also be a factor in whether or not pages get crawled.

Today, most people are searching on Google using a mobile device. In November 2016, Google announced a major change to the way crawling websites and started to make their index mobile-first, which means the mobile version of a given website becomes the starting point for what Google includes in their index. In May 2019, Google updated the rendering engine of their crawler to be the latest version of Chromium (74 at the time of the announcement). Google indicated that they would regularly update the Chromium rendering engine to the latest version. In December 2019, Google began updating the User-Agent string of their crawler to reflect the latest Chrome version used by their rendering service. The delay was to allow webmasters time to update their code that responded to particular bot User-Agent strings. Google ran evaluations and felt confident the impact would be minor.