Ø
What is XML?
·
XML is the eXtensible Markup
Language.
XML |
·
Became a W3C Recommendation in
1998.
·
Tag-based syntax, very much like
HTML.
·
We get to make up our own tags (or
we can use an existing tag set to solve a particular problem).
·
Foundation for several next-gen Web
Technologies like XHTML, RSS (Blogs), AJAX, Web Services.
·
XML does not replace HTML.
Ø What
does XML do?
·
XML
is used to structure and describe information.
·
Intended
to be used with the Internet.
·
XML
can be used as a way to interchange data between disparate systems.
Ø Related
XML Technologies:
·
XPath:
ü
eXtensible Path Language.
ü
Used to extract data from inside an
XML file.
ü
Uses a path-like syntax, similar to
directory or folder paths like “drive:/folder/folder/file”
·
XSLT:
ü
eXtensible
Stylesheet Language Transformations.
ü
Styling
language that takes an XML file and “transforms” it into something else, like HTML, PDF, ASCII, or
even another XML file.
·
XQuery:
ü
Used
to perform query functions on XML data, similar to SQL for databases.
·
XPointer and XLink:
ü
Used
for creating hyperlinks to XML documents and arbitrary points within XML
documents.
Ø Example:
Describing Information
·
Typical Information found on a
business card:
Santosh
Tewari
(415)
555-4567 (mobile)
(800)
555-9876 (work)
(510)
555-1234 (fax)
imstkgp@gmail.com
·
Same business card data, expressed
in XML:
<BusinessCard>
<Name>Santosh
Tewari</Name>
<phone
type=”mobile”>(415) 555-4567</phone>
<phone
type=”work”>(800) 555-9876</phone>
<phone
type=”fax”>(510) 555-1234</phone>
<email>imstkgp@gmail.com</email>
</BusinessCard>
Ø Advantages
Of XML:
·
Content is kept separately from any
notion of presentation.
·
Information can be easily read and
understood.
·
Specific tag sets that target
specific problems can be easily created.
·
XML is an open format that can be
processed by any XML aware application, like a browser, word processor,
spreadsheet, etc.
·
XML data can be exchanged between
systems that were designed to do so.
Ø Disadvantages
Of XML:
·
XML is not especially good at
handling very large amount of data.
·
XML can quickly become more
difficult to read if a lot of information is included in one file.
·
Certain types of data (images,
other binary data) are not represented well in XML.
Ø Proper
XML Syntax:
·
All XML documents have a single root
tag.
·
XML documents must be “well-formed”
ü
Empty tags, i.e. tags that do not
contain any content themselves, must be closed with />
i.e. Instead of <elem></elem> use <elem
/>
ü
Attributes cannot be minimized
i.e. <element attr> is wrong – must do
<element attr=”atrr”>
Attributes values must always be inside quotes, either
single or double:
i.e. <element attr=val> is wrong – must do
<element attr=”val”>
ü
Tags must br properly nested inside
each other
<elem1><elem2></elem1></elem2>
is wrong – always do <elem1><elem2></elem2></elem1>
Ø Contents
of XML Files:
·
XML
Declaration:
<?xml
version=”1.0” encoding=”utf-8” standalone=”yes” ?>
·
Elements
(Sometimes called tags)
<element>,
just like we’ve used in HTML.
·
Attributes
<element
attr=”val”>, again like HTML.
·
Comments
<!-- XML
Comment -->
·
Character
Data Sections
<! [CDATA
[This is some text & data]] >
·
Processing
Instructions
<?SpellCheckMode
mode=”us-english” ?>
·
Entity
References
Character(<)
and General (©right”;)
Ø The
XML Declaration:
<?xml
version=”1.0” encoding=”utf-8” standalone=”yes” ?>
·
Optional,
though the W3C recommends including it
§
Identifies
the file as being XML.
§
Provides
a place for the encoding and standalone declarations.
§
Must
be at very beginning of file - not even whitespace before it.
·
The encoding declaration identifies
the encoding of the document
§
UTF-8 is assumed if we don’t state
otherwise.
·
The standalone declaration states
whether the document stands by itself or references some external markup.
Ø Elements
(Tags):
·
Elements
must have a valid name
§
Begin
with underscore _ or letter, then have zero or more letters, digits, periods,
hyphens or underscore.
§
Can’t
use string “xml” in any case combination.
·
Valid
Names:
<_Element1>
<My.Element>
<My.Ele_ment>
·
Invalid
Names:
<1Tag>
(Wrong: can’t begin with numbers)
<#Element>
(Wrong: invalid character in name)
<Element&Name>
(Wrong: invalid character in name)
<XmL>
(Wrong: xml is reserved)
Ø Attributes:
·
Attributes
are specified on the start tag of an element.
·
Must
start with underscore ( _ ) or letter, then have zero or more letters, digits,
periods, hyphens or underscore.
·
Attributes
that starts with “xml” are reserved.
·
Attributes
can only appear one time on a given element. For example, <element attr1=”a”
attr1=”b”> is wrong.
Ø Comments:
·
Comments
begin with <!-- and end with -->.
·
They
can contain any characters except a double hyphen (&, <, etc. are
allowed)
·
Comments
can appear pretty much anywhere in an XML file as long as they are not inside
another markup element and are not before the xml declaration.
<element
<!-- comment -->> is wrong.
<element>
<!-- Comment -->
</element>
is correct.
Ø Character
Data Section:
·
Used
to contain character data that we want to be part of the document content but
don’t want the parser to try and parse.
·
Have
the form <! [CDATA[Some text here]]>
·
Typically
used when the character data contains a lot of characters (like & or <)
that would otherwise be illegal in XML markup.
·
They
are not nest like elements- we can’t have a CDATA section inside another CDATA
section.
Ø Processing
Instructions:
·
Processing
instructions are special instructions that are only of interest to the application that is processing the
XML.
·
Have
the form <?target instruction ?>
§
The
“xml” target name is reserved.
§
Target
names can start with underscore ( _ ) or letter, then have zero or more
letters, digits, periods, hyphens or underscore.
·
Example:
Our application has different spell checking modes, and we want to be able to
specify the mode in the document:
<?SpellCheckMode
mode=”us-english”?>
Ø Entities:
·
Provides
a way to help shorten and modularize our XML document.
·
Provide
a way to include characters that would otherwise be illegal to type in markup.
·
General Entities:
We can define
these to be whatever we want, and they will be replaced by the parser with a
full string.
©right;
or &author;
·
Character Entities:
<
& or
"
Ø Valid
XML Documents:
·
A
valid XML Document is one that has been tested against a set of rules.
·
These
rules are specified as either DTD (Document Type Definition) or XML Schema
files.
·
DTDs
are simpler but less powerful and don’t use the same syntax as XML itself.
·
Schema
ismore powerful but more complex and is written using XML syntax.
No comments:
Write comments