Tuesday, 18 April 2017

What is XML?

  Ø  What is XML?
·         XML is the eXtensible Markup Language.
XML
·         Became a W3C Recommendation in 1998.
·         Tag-based syntax, very much like HTML.
·         We get to make up our own tags (or we can use an existing tag set to solve a particular problem).
·         Foundation for several next-gen Web Technologies like XHTML, RSS (Blogs), AJAX, Web Services.
·         XML does not replace HTML.
  Ø  What does XML do?
·         XML is used to structure and describe information.
·         Intended to be used with the Internet.
·         XML can be used as a way to interchange data between disparate systems.
  Ø  Related XML Technologies:
·         XPath:
ü  eXtensible Path Language.
ü  Used to extract data from inside an XML file.
ü  Uses a path-like syntax, similar to directory or folder paths like “drive:/folder/folder/file”
·         XSLT:
ü  eXtensible Stylesheet Language Transformations.
ü  Styling language that takes an XML file and “transforms” it into  something else, like HTML, PDF, ASCII, or even another XML file.
·         XQuery:
ü  Used to perform query functions on XML data, similar to SQL for databases.
·         XPointer and XLink:
ü  Used for creating hyperlinks to XML documents and arbitrary points within XML documents.
  Ø  Example: Describing Information
·         Typical Information found on a business card:
Santosh Tewari
(415) 555-4567 (mobile)
(800) 555-9876 (work)
(510) 555-1234 (fax)
imstkgp@gmail.com
·         Same business card data, expressed in XML:
<BusinessCard>
            <Name>Santosh Tewari</Name>
<phone type=”mobile”>(415) 555-4567</phone>
<phone type=”work”>(800) 555-9876</phone>
<phone type=”fax”>(510) 555-1234</phone>
<email>imstkgp@gmail.com</email>
                        </BusinessCard>        
  Ø  Advantages Of XML:
·         Content is kept separately from any notion of presentation.
·         Information can be easily read and understood.
·         Specific tag sets that target specific problems can be easily created.
·         XML is an open format that can be processed by any XML aware application, like a browser, word processor, spreadsheet, etc.
·         XML data can be exchanged between systems that were designed to do so.
  Ø  Disadvantages Of XML:
·         XML is not especially good at handling very large amount of data.
·         XML can quickly become more difficult to read if a lot of information is included in one file.
·         Certain types of data (images, other binary data) are not represented well in XML.
  Ø  Proper XML Syntax:
·         All XML documents have a single root tag.
·         XML documents must be “well-formed”
ü  Empty tags, i.e. tags that do not contain any content themselves, must be closed with />
i.e. Instead of <elem></elem> use <elem />
ü  Attributes cannot be minimized
i.e. <element attr> is wrong – must do <element attr=”atrr”>
Attributes values must always be inside quotes, either single or double:
i.e. <element attr=val> is wrong – must do <element attr=”val”>
ü  Tags must br properly nested inside each other
<elem1><elem2></elem1></elem2> is wrong – always do <elem1><elem2></elem2></elem1>
  Ø  Contents of XML Files:
·         XML Declaration:
<?xml version=”1.0” encoding=”utf-8” standalone=”yes” ?>
·         Elements (Sometimes called tags)
<element>, just like we’ve used in HTML.
·         Attributes
<element attr=”val”>, again like HTML.
·         Comments
<!-- XML Comment -->
·         Character Data Sections
<! [CDATA [This is some text & data]] >
·         Processing Instructions
<?SpellCheckMode mode=”us-english” ?>
·         Entity References
Character(&#60;) and General (&copyright”;)
  Ø  The XML Declaration:
<?xml version=”1.0” encoding=”utf-8” standalone=”yes” ?>
·         Optional, though the W3C recommends including it
§  Identifies the file as being XML.
§  Provides a place for the encoding and standalone declarations.
§  Must be at very beginning of file - not even whitespace before it.
·         The encoding declaration identifies the encoding of the document
§  UTF-8 is assumed if we don’t state otherwise.
·         The standalone declaration states whether the document stands by itself or references some external markup.
  Ø  Elements (Tags):
·         Elements must have a valid name
§  Begin with underscore _ or letter, then have zero or more letters, digits, periods, hyphens or underscore.
§  Can’t use string “xml” in any case combination.
·         Valid Names:
<_Element1>
<My.Element>
<My.Ele_ment>
·         Invalid Names:
<1Tag> (Wrong: can’t begin with numbers)
<#Element> (Wrong: invalid character in name)
<Element&Name> (Wrong: invalid character in name)
<XmL> (Wrong: xml is reserved)

  Ø  Attributes:
·         Attributes are specified on the start tag of an element.
·         Must start with underscore ( _ ) or letter, then have zero or more letters, digits, periods, hyphens or underscore.
·         Attributes that starts with “xml” are reserved.
·         Attributes can only appear one time on a given element. For example, <element attr1=”a” attr1=”b”> is wrong.
  Ø  Comments:
·         Comments begin with <!-- and end with -->.
·         They can contain any characters except a double hyphen (&, <, etc. are allowed)
·         Comments can appear pretty much anywhere in an XML file as long as they are not inside another markup element and are not before the xml declaration.
<element <!-- comment -->> is wrong.
<element>
            <!-- Comment -->
</element> is correct.
  Ø  Character Data Section:
·         Used to contain character data that we want to be part of the document content but don’t want the parser to try and parse.
·         Have the form <! [CDATA[Some text here]]>
·         Typically used when the character data contains a lot of characters (like & or <) that would otherwise be illegal in XML markup.
·         They are not nest like elements- we can’t have a CDATA section inside another CDATA section. 
  Ø  Processing Instructions:
·         Processing instructions are special instructions that are only of interest  to the application that is processing the XML.
·         Have the form <?target instruction ?>
§  The “xml” target name is reserved.
§  Target names can start with underscore ( _ ) or letter, then have zero or more letters, digits, periods, hyphens or underscore.
·         Example: Our application has different spell checking modes, and we want to be able to specify the mode in the document:
<?SpellCheckMode mode=”us-english”?>
  Ø  Entities:
·         Provides a way to help shorten and modularize our XML document.
·         Provide a way to include characters that would otherwise be illegal to type in markup.
·         General Entities:
We can define these to be whatever we want, and they will be replaced by the parser with a full string.
&copyright; or &author;
·         Character Entities:
&#060;
&amp; or &quot;

  Ø  Valid XML Documents:
·         A valid XML Document is one that has been tested against a set of rules.
·         These rules are specified as either DTD (Document Type Definition) or XML Schema files.
·         DTDs are simpler but less powerful and don’t use the same syntax as XML itself.

·         Schema ismore powerful but more complex and is written using XML syntax.

No comments:
Write comments