Working with Open Source Web Server Innovation

Working with Open Source Web Server Innovation

COURTESY :- vrindawan.in

Wikipedia

Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized software development model that encourages open collaboration. A main principle of open-source software development is peer production, with products such as source code, blueprints, and documentation freely available to the public. The open-source movement in software began as a response to the limitations of proprietary code. The model is used for projects such as in open-source appropriate technology, and open-source drug discovery.

Open-source software - Wikipedia

Open source promotes universal access via an open-source or free license to a product’s design or blueprint, and universal redistribution of that design or blueprint. Before the phrase open source became widely adopted, developers and producers have used a variety of other terms. Open source gained hold with the rise of the Internet. The open-source software movement arose to clarify copyright, licensing, domain, and consumer issues.

Generally, open source refers to a computer program in which the source code is available to the general public for use or modification from its original design. Code is released under the terms of a software license. Depending on the license terms, others may then download, modify, and publish their version (fork) back to the community.

Many large formal institutions have sprung up to support the development of the open-source movement, including the Apache Software Foundation, which supports community projects such as the open-source framework Apache Hadoop and the open-source HTTP server Apache HTTP.

The sharing of technical information predates the Internet and the personal computer considerably. For instance, in the early years of automobile development a group of capital monopolists owned the rights to a 2-cycle gasoline-engine patent originally filed by George B. Selden. By controlling this patent, they were able to monopolize the industry and force car manufacturers to adhere to their demands, or risk a lawsuit.

In 1911, independent automaker Henry Ford won a challenge to the Selden patent. The result was that the Selden patent became virtually worthless and a new association (which would eventually become the Motor Vehicle Manufacturers Association) was formed. The new association instituted a cross-licensing agreement among all US automotive manufacturers: although each company would develop technology and file patents, these patents were shared openly and without the exchange of money among all the manufacturers. By the time the US entered World War II, 92 Ford patents and 515 patents from other companies were being shared among these manufacturers, without any exchange of money (or lawsuits).

Early instances of the free sharing of source code include IBM’s source releases of its operating systems and other programs in the 1950s and 1960s, and the SHARE user group that formed to facilitate the exchange of software. Beginning in the 1960s, ARPANET researchers used an open “Request for Comments” (RFC) process to encourage feedback in early telecommunication network protocols. This led to the birth of the early Internet in 1969.

The sharing of source code on the Internet began when the Internet was relatively primitive, with software distributed via UUCP, Usenet, IRC, and Gopher. BSD, for example, was first widely distributed by posts to comp.os.linux on the Usenet, which is also where its development was discussed. Linux followed in this model.

There is an example of “open source,” meaning the source code was published, but not free software (purchasing a license was still required for use), in 1996. Other recollection have it in use during the 1980s.

It was later proposed by a group of people in the free software movement who were critical of the political agenda and moral philosophy implied in the term “free software” and sought to reframe the discourse to reflect a more commercially minded position. In addition, the ambiguity of the term “free software” was seen as discouraging business adoption. However, the ambiguity of the word “free” exists primarily in English as it can refer to cost. The group included Christine Peterson, Todd Anderson, Larry Augustin, Jon Hall, Sam Ockman, Michael Tiemann and Eric S. Raymond. Peterson suggested “open source” at a meeting held at Palo Alto, California, in reaction to Netscape’s announcement in January 1998 of a source code release for Navigator. Linus Torvalds gave his support the following day, and Phil Hughes backed the term in Linux Journal. Richard Stall man, the founder of the free software movement, quickly decided against endorsing the term. Netscape released its source code under the Netscape Public License and later under the Mozilla Public License.

Raymond was especially active in the effort to popularize the new term. He made the first public call to the free software community to adopt it in February 1998. Shortly after, he founded The Open Source Initiative in collaboration with Bruce Perens.

The term gained further visibility through an event organized in April 1998 by technology publisher Tim O’Reilly. Originally titled the “Freeware Summit” and later known as the “Open Source Summit,” the event was attended by the leaders of many of the most important free and open-source projects, including Linus Torvalds, Larry Wall, Brian Behlendorf, Eric Allman, Guido van Rossum, Michael Tiemann, Paul Vixie, Jamie Zawinski, and Eric Raymond. At that meeting, alternatives to the term “free software” were discussed. Tiemann argued for “source ware” as a new term, while Raymond argued for “open source.” The assembled developers took a vote, and the winner was announced at a press conference the same evening.

“Open source” has never managed to entirely supersede the older term “free software,” giving rise to the combined term free and open-source software (FOSS).

web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.

Web server - Wikipedia

The hardware used to run a web server can vary according to the volume of requests that it needs to handle. At the low end of the range are embedded systems, such as a router that runs a small web server as its configuration interface. A high-traffic Internet website might handle requests with hundreds of servers that run on racks of high-speed computers.

A resource sent from a web server can be a preexisting file (static content) available to the web server, or it can be generated at the time of the request (dynamic content) by another program that communicates with the server software. The former usually can be served faster and can be more easily cached for repeated requests, while the latter supports a broader range of applications.

Technologies such as REST and SOAP, which use HTTP as a basis for general computer-to-computer communication, as well as support for Web DAV extensions, have extended the application of web servers well beyond their original purpose of serving human-readable pages.

This is a very brief history of web server programs, so some information necessarily overlaps with the histories of the web browsers, the World Wide Web and the Internet; therefore, for the sake of clearness and under stand ability, some key historical information below reported may be similar to that found also in one or more of the above-mentioned history articles.

n March 1989, Sir Tim Berners-Lee proposed a new project to his employer CERN, with the goal of easing the exchange of information between scientists by using a hypertext system. The proposal titled “Hyper Text and CERN”, asked for comments and it was read by several people. In October 1990 the proposal was reformulated and enriched (having as co-author Robert Cailliau), and finally, it was approved.

Between late 1990 and early 1991 the project resulted in Berners-Lee and his developers writing and testing several software libraries along with three programs, which initially ran on NeXT STEP OS installed on NeXT workstations: 

  • a graphical web browser, called World Wide Web;
  • a portable line mode web browser;
  • a web server, later known as CERN httpd.

Those early browsers retrieved web pages from web server(s) using a new basic communication protocol that was named HTTP 0.9.

In August 1991 Tim Berner-Lee announced the birth of WWW technology and encouraged scientists to adopt and develop it. Soon after, those programs, along with their source code, were made available to people interested in their usage. In practice CERN informally allowed other people, including developers, etc., to play with and maybe further develop what it has been made till that moment. This was the official birth of CERN httpd. Since then Berner-Lee started promoting the adoption and the usage of those programs along with their porting to other OSs.

In December 1991 the first web server outside Europe was installed at SLAC (U.S.A.). This was a very important event because it started trans-continental web communications between web browsers and web servers.

In 1991-1993 CERN web server program continued to be actively developed by the www group, meanwhile, thanks to the availability of its source code and the public specifications of the HTTP protocol, many other implementations of web servers started to be developed.

In April 1993 CERN issued a public official statement stating that the three components of Web software (the basic line-mode client, the web server and the library of common code), along with their source code, were put in the public domain. This statement freed web server developers from any possible legal issue about the development of derivative work based on that source code (a threat that in practice never existed).

At the beginning of 1994, the most notable among new web servers was NCSA httpd which ran on a variety of Unix-based OSs and could serve dynamically generated content by implementing the POST HTTP method and the CGI to communicate with external programs. These capabilities, along with the multimedia features of NCSA’s Mosaic browser (also able to manage HTML FORMs in order to send data to a web server) highlighted the potential of web technology for publishing and distributed computing applications.

In the second half of 1994, the development of NCSA httpd stalled to the point that a group of external software developers, webmasters and other professional figures interested in that server, started to write and collect patches thanks to the NCSA httpd source code being available to the public domain. At the beginning of 1995 those patches were all applied to the last release of NCSA source code and, after several tests, the Apache HTTP server project was started.

At the end of 1994 a new commercial web server, named Netsite, was released with specific features. It was the first one of many other similar products that were developed first by Netscape, then also by Sun Microsystems, and finally by Oracle Corporation.

In mid-1995 the first version of IIS was released, for Windows NT OS, by Microsoft. This was a notable event because marked the entry, in the field of World Wide Web technologies, of a very important commercial developer and vendor that has played and still is playing a key role on both sides (client and server) of the web.

In the second half of 1995 CERN and NCSA web servers started to decline (in global percentage usage) because of the widespread adoption of new web servers which had a much faster development cycle along with more features, more fixes applied, and more performances than the previous ones.

At the end of 1996 there were already over fifty known (different) web server software programs that were available to everybody who wanted to own an Internet domain name and/or to host websites. Many of them lived only shortly and were replaced by other web servers.

The publication of RFCs about protocol versions HTTP/1.0 (1996) and HTTP/1.1 (1997, 1999), forced most web servers to comply (not always completely) with those standards. The use of TCP/IP persistent connections (HTTP/1.1) required web servers both to increase a lot the maximum number of concurrent connections allowed and to improve their level of scalability.

Between 1996 and 1999 Netscape Enterprise Server and Microsoft’s IIS emerged among the leading commercial options whereas among the freely available and open-source programs Apache HTTP Server held the lead as the preferred server (because of its reliability and its many features).

In those years there was also another commercial, highly innovative and thus notable web server called Zeus (now discontinued) that was known as one of the fastest and most scalable web servers available on market, at least till the first decade of 2000s, despite its low percentage of usage.

Apache resulted in the most used web server from mid-1996 to the end of 2015 when, after a few years of decline, it was surpassed initially by IIS and then by Nginx. Afterward IIS dropped to much lower percentages of usage than Apache (see also market share).

From 2005-2006 Apache started to improve its speed and its scalability level by introducing new performance features (e.g. event MPM and new content cache). As those new performance improvements initially were marked as experimental, they were not enabled by its users for a long time and so Apache suffered, even more, the competition of commercial servers and, above all, of other open-source servers which meanwhile had already achieved far superior performances (mostly when serving static content) since the beginning of their development and at the time of the Apache decline were able to offer also a long enough list of well tested advanced features.

In fact, a few years after 2000 started, not only other commercial and highly competitive web servers, e.g. Lite Speed, but also many other open-source programs, often of excellent quality and very high performances, among which should be noted Hiawatha, Cherokee HTTP server, Lighttpd, Nginx and other derived/related products also available with commercial support, emerged.

Around 2007-2008 most popular web browsers increased their previous default limit of 2 persistent connections per host-domain (a limit recommended by RFC-2616)  to 4, 6 or 8 persistent connections per host-domain, in order to speed up the retrieval of heavy web pages with lots of images, and to mitigate the problem of the shortage of persistent connections dedicated to dynamic objects used for bi-directional notifications of events in web pages. Within a year, these changes, on average, nearly tripled the maximum number of persistent connections that web servers had to manage. This trend (of increasing the number of persistent connections) definitely gave a strong impetus to the adoption of reverse proxies in front of slower web servers and it gave also one more chance to the emerging new web servers that could show all their speed and their capability to handle very high numbers of concurrent connections without requiring too many hardware resources (expensive computers with lots of CPUs, RAM and fast disks).