Ch. 2: Markup Language and Site Development Essentials
| HTML and SGML | XML | XHTML | Web
Site Development Principles |
Intellectual Property |
Standardized Generalized Markup Language (SGML)
SGML is a metalanguage, which means that it is used to create other languages, including HTML and XHTML. SGML was originally created by IBM, and was standardized in 1986 by the International organization for Standardization (ISO). SGML is a powerful markup language that describes documents by organizing concepts separately from their visual presentation. However, it is also very complex and difficult to learn.
SGML was not IBM's first metalanguage. IBM created the Generalized Markup Language (GML) in the late 1960s as a way to use formatted documents across different computer platforms. GML then evolved into SGML.
SGML's purpose was to describe only the information within a document, not the formatting of it. With SGML, you can describe how data elements in the document relate to each other. SGML was not designed to format the data's appearance on the page.
SGML essentially requires that you create, or define, your own document language rules. This set of language rules is called the Document Type Definition (DTD). The DTD is generally specified in a separate file, which you reference, or declare, at the beginning of each document that you want to conform to the rules. Once the DTD is established, then all elements in the document must conform to it.
Hypertext Markup Language (HTML)
Tim Berners-Lee of MIT invented Hypertext Markup Language (HTML) with colleagues from CERN (the European Particle Physics Laboratory) as a means of distributing nonlinear text, called hypertext, to multiple points across the Internet. Berners-Lee felt that SGML and other languages were needlessly complex and did not suit the need for a cross-platform language that helped format documents.
In HTML, one document links to another via pointers called hyperlinks. Hyperlinks are embedded instructions within a text file that call another location in the file or a separate file when the link is accessed, usually by a click of a mouse. The global set of linked documents across the existing Internet framework grew into the World Wide Web.
| Hypertext and Hypermedia Hypermedia is an extension of hypertext. It includes images, video, audio, animation and other multimedia data types, which can be incorporated into HTML documents. the Web can accurately be described as a hypermedia system. Hypertext was first conceived by Ted Nelson in 1965. The first widely commercialized hypertext product was HyperCard, conceived by Bill Atkinson and introduced by Apple Computer in 1987. It incorporated many hypertext and hypermedia concepts, but was a proprietary system that worked only on Macintosh computers. By contrast, HTML is a cross-platform language that works on Windows, Macintosh and UNIX platforms. In addition, HTML and the Web are client/server systems, whereas HyperCard works only on stand-alone Macintosh computers.
|
![]() |
HTML vs SGML
Like SGML, HTML facilitates data exchange through a common document format across different types of computer systems and networks on the Web. However, HTML does not allow you to define a DTD and has fewer language elements than SGML. As a result, HTML is easier to use and has become the standard method of encoding information for Web documents.
Markup Languages
A markup language is very different from a programming language. Programming languages such as C, C++, Java and C# must be compiled before they are used. Applications that are compiled have separate program files and data files. In a markup language, the instructions and the data generally reside in the same file. Some instructions may reside in separate files, but markup languages generally do not require complex supporting libraries. Markup languages do not need to be compiled. In addition, HTML does not provide data structures or internal logic, as do procedural programming languages such as C and Pascal.
Whereas SGML is used specifically to define context as opposed to appearance, HTML has evolved into both a contextual and a formatting language. HTML files are plain text files that have been "marked up" with special language elements called tags, which are embedded in the text.
Tags are pieces of text, enclosed in angle brackets (or "wickets") that provide instructions to programs designed to interpret HTML.
Interpreters
HTML interpreters are programs that process the HTML pages and render them to the user as text pages formatted in accordance with the embedded instructions. Examples of HTML interpreters are Web browsers such as Opera, Lynx, Firefox and Internet Explorer.
Although HTML was specifically designed for use on the World Wide Web, many businesses are finding uses for HTML documents that have little or nothing to do with the Web. HTML files are very small and extremely portable, making this format an ideal choice when exchanging documents across any type of network.
HTML 3.2 Standard and HTML 4.01
HTML 3.2 is an older but still fully functional HTML standard. Many Web pages and HTML editors still use the 3.2 standard. This standard is quite universal because many people surf the Web using older Web browsers that cannot process all the elements required by the newer HTML 4.01 Recommendations.
The HTML 4.01 Recommendation was released in December 1999 and contains the latest specifications.
HTML allows you to use Cascading Style Sheets (CSS) and supports multiple languages. For example, HTML 4.01 allows you to create Web pages that read languages such as Hebrew from right to left.
HTML 4.01 flavors
The HTML 4.01 flavors ensure that you can use the latest specification, yet remain backward-compatible with older Web browsers.