An HTML element is defined by a start tag, some content, and an end tag. The HTML element is everything from the start tag to the end tag:
<tagname>Content goes here...</tagname>
Examples:
<h1>欢迎来到爬虫小组的网页!</h1>
<p>这是一个简单的测试页 </p>
Start tag | Element content | End tag |
---|---|---|
<h1> |
欢迎来到爬虫小组的网页 | </h1> |
<p> |
这是一个简单的测试页 | <p> |
<br> |
none | none |
Note: Some HTML elements have no content (like the <br>
element). These elements are called empty elements. Empty elements do not have an end tag!
Common elements explained:
The
<!DOCTYPE html>
declaration defines that this document is an HTML5 document.
The<html>
element is the root element of an HTML page
The<head>
element contains meta information about the HTML page
The<title>
element specifies a title for the HTML page (which is shown in the browser’s title bar or in the page’s tab)
The<body>
element defines the document’s body, and is a container for all the visible contents, such as headings, paragraphs, images, hyperlinks, tables, lists, etc.
The<h1><h2>
element defines a large heading
The<p>
element defines a paragraph
HTML attributes provide additional information about HTML elements. All HTML elements can have attributes. Attributes are always specified in the start tag. Attributes usually come in name/value pairs like: name=“value”.
The href Attribute:
The <a>
tag defines a hyperlink. The href
attribute specifies the URL of the page the link goes to.
Example:
<a href="http://gitlabce.apps.dit-prdocp.novartis.net/YUHAY/web-crawler-do.git">Web Crawler DO</a>
The src Attribute:
The <img>
tag is used to embed an image in an HTML page. The src
attribute specifies the path to the image to be displayed.
Example:
<img src="https://www.w3schools.com/js/pic_bulboff.gif">
The style Attribute:
The style
attribute is used to add styles to an element, such as color, font, size, and more.
Example:
<img style="width:30px">
The
width
andheight
attributes of<img>
provide size information for images
Thealt
attribute of<img>
provides an alternate text for an image
Thelang
attribute of the<html>
tag declares the language of the Web page
Thetitle
attribute defines some extra information about an element
HTML documents can be treated as trees of nodes. Look at the following document:
<books>
<book>
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</books>
The topmost element of the tree is called the root element. <books>
is the root element node of the above tree. There are other element nodes, such as <author>J K. Rowling</author>
, <year>2005</year>
, etc.
It also looks like the path of computer file systems:
Example 1:
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
1. Parent
Each element has one parent.
In the example 1; the book
element is the parent of the title
, author
, year
, and price
.
2. Children
Element nodes may have zero, one or more children.
In the example 1; the title
, author
, year
, and price
elements are all children of the book
element.
3. Siblings
Nodes that have the same parent.
In the example 1; the title
, author
, year
, and price
elements are all siblings.
Example 2:
<books>
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</books>
4. Ancestors
A node’s parent, parent’s parent, etc.
In the example 2; the ancestors of the title
element are the book
element and the books
element.
5. Descendants
A node’s children, children’s children, etc.
In the example 2; descendants of the books
element are the book
, title
, author
, year
, and price
elements.