Markup language & JSON
COURTESY :- vrindawan.in
Wikipedia
Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitate automated processing. A markup language is a set of rules governing what markup information may be included in a document and how it is combined with the content of the document in a way to facilitate use by humans and computer programs. The idea and terminology evolved from the “marking up” of paper manuscripts (i.e., the revision instructions by editors), which is traditionally written with a red pen or blue pencil on authors’ manuscripts.
![]()
Older markup languages, which typically focus on typography and presentation, include troff, TeX and LaTeX. Scribe and most modern markup languages, for example XML, identify document components (for example headings, paragraphs, and tables), with the expectation that technology such as style sheets will be used to apply formatting or other processing.
Some markup languages, such as the widely used HTML, have pre-defined presentation semantics, meaning that their specification prescribes some aspects of how to present the structured data on particular media. HTML, like DocBook, Open eBook, JATS and many others, is based on the markup meta-languages SGML and XML. That is, SGML and XML allow designers to specify particular schemas, which determine which elements, attributes, and other features are permitted, and where.
One extremely important characteristic of most markup languages is that they allow intermingling markup with document content such as text and pictures. For example, if a few words in a sentence need to be emphasized, or identified as a proper name, defined term, or other special item, the markup may be inserted between the characters of the sentence. This is quite different structurally from traditional databases, where it is by definition impossible to have data that is within a record but not within any field. Furthermore, markup for human-readable texts must maintain ordering: it would not suffice to make each paragraph of a book into a “paragraph” record, where those records do not maintain order.
The noun markup is derived from the traditional publishing practice called “marking up” a manuscript,which involves adding handwritten annotations in the form of conventional symbolic printer’s instructions — in the margins and the text of a paper or a printed manuscript.
For centuries, this task was done primarily by skilled typographers known as “markup men” or “d markers who marked up text to indicate what typeface, style, and size should be applied to each part, and then passed the manuscript to others for typesetting by hand or machine.
Markup was also commonly applied by editors, proofreaders, publishers, and graphic designers, and indeed by document authors, all of whom might also mark other things, such as corrections, changes, etc.
There are three main general categories of electronic markup, articulated in Coombs, Renear and De Rose (1987), and Bray (2003).
The kind of markup used by traditional word-processing systems: binary codes embedded within document text that produce the WYSIWYG (“what you see is what you get“) effect. Such markup is usually hidden from the human users, even authors and editors. Properly speaking, such systems use procedural and/or descriptive markup underneath, but convert it to “present” to the user as geometric arrangements of type.
Markup is embedded in text which provides instructions for programs to process the text. Well-known examples include troff, TeX, and Markdown. It is assumed that software processes the text sequentially from beginning to end, following the instructions as encountered. Such text is often edited with the markup visible and directly manipulated by the author. Popular procedural markup systems usually include programming constructs, especially macros, allowing complex sets of instructions to be invoked by a simple name (and perhaps a few parameters). This is much faster, less error-prone, and maintenance-friendly than re-stating the same or similar instructions in many places.
- Markup is specifically used to label parts of the document for what they are, rather than how they should be processed. Well-known systems that provide many such labels include LaTeX, HTML, and XML. The objective is to decouple the structure of the document from any particular treatment or rendition of it. Such markup is often described as “semantic”. An example of a descriptive markup would be HTML’s
<cite>tag, which is used to label a citation. Descriptive markup — sometimes called logical markup or conceptual markup — encourages authors to write in a way that describes the material conceptually, rather than visually.
There is considerable blurring of the lines between the types of markup. In modern word-processing systems, presentational markup is often saved in descriptive-markup-oriented systems such as XML, and then processed procedurally by implementations. The programming in procedural-markup systems, such as TeX, may be used to create higher-level markup systems that are more descriptive in nature, such as LaTeX.
In recent years, a number of markup languages have been developed with ease of use as a key goal, and without input from standards organizations, aimed at allowing authors to create formatted text via web browsers, for example in wikis and in web forums. These are sometimes called lightweight markup languages. Markdown, BBCode, and the markup language used by Wikipedia are examples of such languages.
The first well-known public presentation of markup languages in computer text processing was made by William W. Tunnicliffe at a conference in 1967, although he preferred to call it generic coding. It can be seen as a response to the emergence of programs such as RUNOFF that each used their own control notations, often specific to the target typesetting device. In the 1970s, Tunnicliffe led the development of a standard called GenCode for the publishing industry and later was the first chairman of the International Organization for Standardization committee that created SGML, the first standard descriptive markup language. Book designer Stanley Rice published speculation along similar lines in 1970.
Brian Reid, in his 1980 dissertation at Carnegie Mellon University, developed the theory and a working implementation of descriptive markup in actual use. However, IBM researcher Charles Goldfarb is more commonly seen today as the “father” of markup languages. Goldfarb hit upon the basic idea while working on a primitive document management system intended for law firms in 1969, and helped invent IBM GML later that same year. GML was first publicly disclosed in 1973.
In 1975, Goldfarb moved from Cambridge, Massachusetts to Silicon Valley and became a product planner at the IBM Almaden Research Center. There, he convinced IBM’s executives to deploy GML commercially in 1978 as part of IBM’s Document Composition Facility product, and it was widely used in business within a few years.
SGML, which was based on both GML and GenCode, was an ISO project worked on by Goldfarb beginning in 1974. Goldfarb eventually became chair of the SGML committee. SGML was first released by ISO as the ISO 8879 standard in October 1986.
JSON is a language-independent data format. It was derived from JavaScript, but many modern programming languages include code to generate and parse JSON-format data. JSON filenames use the extension .json. Any valid JSON file is a valid JavaScript (.js) file, even though it makes no changes to a web page on its own.
![]()
Douglas Crockford originally specified the JSON format in the early 2000s. He and Chip Morning star sent the first JSON message in April 2001.
The acronym originated at State Software, a company co-founded by Douglas Crockford and others in March 2013.
The 2017 international standard (ECMA-404 and ISO/IEC 21778:2017) specifies “Pronounced /ˈdʒeɪ.sən/, as in ‘Jason and The Argonauts'”. The first (2013) edition of ECMA-404 did not address the pronunciation. The UNIX and Linux System Administration Handbook states that “Douglas Crock ford, who named and promoted the JSON format, says it’s pronounced like the name Jason. But somehow, ‘JAY-sawn’ seems to have become more common in the technical community.” Crock ford said in 2011, “There’s a lot of argument about how you pronounce that, but I strictly don’t care.”
After RFC 4627 had been available as its “informational” specification since 2006, JSON was first standardized in 2013, as ECMA-404. RFC 8259, published in 2017, is the current version of the Internet Standard STD 90, and it remains consistent with ECMA-404. That same year, JSON was also standardized as ISO/IEC 21778:2017. The ECMA and ISO/IEC standards describe only the allowed syntax, whereas the RFC covers some security and interoperability considerations.
