CSS selector & Xpath

3.1 Introduction

CSS, Cascading Style Sheets, is a style sheet language used for describing the presentation of a document written in a markup language such as HTML. And CSS selectors are used to select the content you want to style. Selectors are the part of CSS rule set. CSS selectors select HTML elements according to its id, class, type, attribute etc.

XPath, XML Path Language, is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values from the content of an XML document. XPath was defined by the World Wide Web Consortium. CSS Selector vs XPATH

No.	HTML	XML
1)	HTML is used to display data and focuses on how data looks.	XML is a software and hardware independent tool used to transport and store data. It focuses on what data is.
2)	HTML is a markup language itself.	XML provides a framework to define markup languages.
3)	HTML is not case sensitive.	XML is case sensitive.
4)	HTML is a presentation language.	XML is neither a presentation language nor a programming language.
5)	HTML has its own predefined tags.	You can define tags according to your need.
6)	In HTML, it is not necessary to use a closing tag.	XML makes it mandatory to use a closing tag.
7)	HTML is static because it is used to display data.	XML is dynamic because it is used to transport data.
8)	HTML does not preserve whitespaces.	XML preserve whitespaces.

HTML vs XML

3.3 Examples

Sample HTML: link

Example 1:Abusolte path

check_element_text(html = html, css = "body > div > p > t")

[1] “Beat COVID-19!” “Dumpling YYDS!” “Let’s romantic!” “Sweet Dumpling YYDS!”

check_element_text(html = html, xpath = "/html/body/div/p/t")

[1] “Beat COVID-19!” “Dumpling YYDS!” “Let’s romantic!” “Sweet Dumpling YYDS!”

Example 2:Relative path

check_element_text(html = html, css = "body t")

[1] “Beat COVID-19!” “Dumpling YYDS!” “Let’s romantic!” “Sweet Dumpling YYDS!”

check_element_text(html = html, xpath = "//body//t")

[1] “Beat COVID-19!” “Dumpling YYDS!” “Let’s romantic!” “Sweet Dumpling YYDS!”

Example 3:Single element

check_element_text(html = html, css = "t")

[1] “Beat COVID-19!” “Dumpling YYDS!” “Let’s romantic!” “Sweet Dumpling YYDS!”

check_element_text(html = html, xpath = "//t")

[1] “Beat COVID-19!” “Dumpling YYDS!” “Let’s romantic!” “Sweet Dumpling YYDS!”

Example 4:Multiple element

check_element_text(html = html, css = "t, price")

[1] “Beat COVID-19!” “1000000000.00”
[3] “Dumpling YYDS!” “100.00”
[5] “Let’s romantic!” “1000.0”
[7] “Sweet Dumpling YYDS!” “100.0”

check_element_text(html = html, xpath = "//t | //price")

[1] “Beat COVID-19!” “1000000000.00”
[3] “Dumpling YYDS!” “100.00”
[5] “Let’s romantic!” “1000.0”
[7] “Sweet Dumpling YYDS!” “100.0”

Example 5:Position

check_element_text(html = html, css = "div:first-of-type")

[1] “Welcome 2022! Beat COVID-19! 1000000000.00”

check_element_text(html = html, css = "div:nth-of-type(2)")

[1] “Happy Spring Festival! Dumpling YYDS! 100.00”

check_element_text(html = html, css = "div:last-of-type")

[1] “Happy Valentine’s Day! Let’s romantic! 1000.0 Happy Lantern Festival! Sweet Dumpling YYDS! 100.0”

check_element_text(html = html, xpath = "//div[position()=1]")

[1] “Welcome 2022! Beat COVID-19! 1000000000.00”

check_element_text(html = html, xpath = "//div[2]")

[1] “Happy Spring Festival! Dumpling YYDS! 100.00”

check_element_text(html = html, xpath = "//div[last()]")

[1] “Happy Valentine’s Day! Let’s romantic! 1000.0 Happy Lantern Festival! Sweet Dumpling YYDS! 100.0”

Example 6:ID attribute

check_element_text(html = html, css = "div#div2")

[1] “Happy Spring Festival! Dumpling YYDS! 100.00”

check_element_text(html = html, xpath = "//div[@id='div2']")

[1] “Happy Spring Festival! Dumpling YYDS! 100.00”

Example 7:Class attribute

check_element_text(html = html, css = "div.class1")

[1] “Welcome 2022! Beat COVID-19! 1000000000.00”
[2] “Happy Valentine’s Day! Let’s romantic! 1000.0 Happy Lantern Festival! Sweet Dumpling YYDS! 100.0”

check_element_text(html = html, css = "div.class1.class2")

[1] “Happy Valentine’s Day! Let’s romantic! 1000.0 Happy Lantern Festival! Sweet Dumpling YYDS! 100.0”

check_element_text(html = html, xpath = "//div[@class='class1']")

[1] “Welcome 2022! Beat COVID-19! 1000000000.00”

check_element_text(html = html, xpath = "//div[@class='class1 class2']")

[1] “Happy Valentine’s Day! Let’s romantic! 1000.0 Happy Lantern Festival! Sweet Dumpling YYDS! 100.0”

Example 8:Asterisk

check_element_text(html = html, css = "*.class3")

[1] “Welcome 2022! Beat COVID-19! 1000000000.00”
[2] “Happy Spring Festival! Dumpling YYDS! 100.00”
[3] “Happy Valentine’s Day! Let’s romantic! 1000.0”

check_element_text(html = html, xpath = "//*[@class='class3']")

[1] “Welcome 2022! Beat COVID-19! 1000000000.00”
[2] “Happy Spring Festival! Dumpling YYDS! 100.00”
[3] “Happy Valentine’s Day! Let’s romantic! 1000.0”

Example 9:Relationship

check_element_text(html = html, css = "div#div2 t")

[1] “Dumpling YYDS!”

check_element_text(html = html, xpath = "//div[@id='div2']/descendant::t")

[1] “Dumpling YYDS!”

check_element_text(html = html, xpath = "//t[@class='text1']/ancestor::div")

[1] “Welcome 2022! Beat COVID-19! 1000000000.00”

Example 10:Calculation

Not avaiable

check_element_text(html = html, xpath = "//p[price>50.0]/t")

[1] “Beat COVID-19!” “Dumpling YYDS!” “Let’s romantic!” “Sweet Dumpling YYDS!”

3.3 Summary

3.3.1 CSS Selector Summary

Selector	Example	Result
.class	.intro	Selects all elements with class=“intro”
.class1.class2	.name1.name2	Selects all elements with both name1 and name2 set within its class attribute
.class1 .class2	.name1 .name2	Selects all elements with name2 that is a descendant of an element with name1
#id	#firstname	Selects the element with id=“firstname”
*	*	Selects all elements
element	p	Selects all p elements
element.class	p.intro	Selects all p elements with class=“intro”
element,element	div, p	Selects all div elements and all p elements
element element	div p	Selects all p elements inside div elements
element>element	div > p	Selects all p elements where the parent is a div element
element+element	div + p	Selects the first p element that is placed immediately after div elements
element1~element2	p ~ ul	Selects every ul element that is preceded by a p element
[attribute]	[target]	Selects all elements with a target attribute
[attribute=value]	[target=_blank]	Selects all elements with target="_blank"
[attribute~=value]	[title~=flower]	Selects all elements with a title attribute containing the word “flower”
[attribute\|=value]	[lang\|=en]	Selects all elements with a lang attribute value equal to “en” or starting with “en-”
[attribute^=value]	a[href^=“https”]	Selects every a element whose href attribute value begins with “https”
[attribute$=value]	a[href$=".pdf"]	Selects every a element whose href attribute value ends with “.pdf”
[attribute*=value]	a[href*=“w3schools”]	Selects every a element whose href attribute value contains the substring “w3schools”

Selector	Example	Result
:active	a:active	Selects the active link
::after	p::after	Insert something after the content of each p element
::before	p::before	Insert something before the content of each p element
:checked	input:checked	Selects every checked input element
:default	input:default	Selects the default input element
:disabled	input:disabled	Selects every disabled input element
:empty	p:empty	Selects every p element that has no children (including text nodes)
:enabled	input:enabled	Selects every enabled input element
:first-child	p:first-child	Selects every p element that is the first child of its parent
::first-letter	p::first-letter	Selects the first letter of every p element
::first-line	p::first-line	Selects the first line of every p element
:first-of-type	p:first-of-type	Selects every p element that is the first p element of its parent
:focus	input:focus	Selects the input element which has focus
:fullscreen	:fullscreen	Selects the element that is in full-screen mode
:hover	a:hover	Selects links on mouse over
:in-range	input:in-range	Selects input elements with a value within a specified range
:indeterminate	input:indeterminate	Selects input elements that are in an indeterminate state
:invalid	input:invalid	Selects all input elements with an invalid value
:lang(language)	p:lang(it)	Selects every p element with a lang attribute equal to “it” (Italian)
:last-child	p:last-child	Selects every p element that is the last child of its parent
:last-of-type	p:last-of-type	Selects every p element that is the last p element of its parent
:link	a:link	Selects all unvisited links
::marker	::marker	Selects the markers of list items
:not(selector)	:not(p)	Selects every element that is not a p element
:nth-child(n)	p:nth-child(2)	Selects every p element that is the second child of its parent
:nth-last-child(n)	p:nth-last-child(2)	Selects every p element that is the second child of its parent, counting from the last child
:nth-last-of-type(n)	p:nth-last-of-type(2)	Selects every p element that is the second p element of its parent, counting from the last child
:nth-of-type(n)	p:nth-of-type(2)	Selects every p element that is the second p element of its parent
:only-of-type	p:only-of-type	Selects every p element that is the only p element of its parent
:only-child	p:only-child	Selects every p element that is the only child of its parent
:optional	input:optional	Selects input elements with no “required” attribute
:out-of-range	input:out-of-range	Selects input elements with a value outside a specified range
::placeholder	input::placeholder	Selects input elements with the “placeholder” attribute specified
:read-only	input:read-only	Selects input elements with the “readonly” attribute specified
:read-write	input:read-write	Selects input elements with the “readonly” attribute NOT specified
:required	input:required	Selects input elements with the “required” attribute specified
:root	:root	Selects the document’s root element
::selection	::selection	Selects the portion of an element that is selected by a user
:target	#news:target	Selects the current active #news element (clicked on a URL containing that anchor name)
:valid	input:valid	Selects all input elements with a valid value
:visited	a:visited	Selects all visited links

CSS Selector Cheat Sheet

3.3.2 XPATH Summary

Expression	Description	Example	Result
nodename	Selects all nodes with the name “nodename”	bookstore	Selects all nodes with the name “bookstore”
/	Selects from the root node	/bookstore	Selects the root element bookstore
//	Selects nodes in the document from the current node that match the selection	bookstore/book	Selects all book elements that are children of bookstore
.	Selects the current node	//book	Selects all book elements no matter where they are in the document
..	Selects the parent of the current node	bookstore//book	Selects all book elements that are descendant of the bookstore element
@	Selects attributes	//@lang	Selects all attributes that are named lang
*	Matches any element node	/bookstore/*	Selects all the child element nodes of the bookstore element
@*	Matches any attribute node	//*	Selects all elements in the document
node()	Matches any node of any kind	//title[@*]	Selects all title elements which have at least one attribute of any kind

Note: If the path starts with a slash ( / ) it always represents an absolute path to an element!

Axesname	Description	Example	Result
ancestor	Selects all ancestors (parent, grandparent, etc.) of the current node	ancestor::book	Selects all book nodes that are ancestor of the current node
ancestor-or-self	Selects all ancestors (parent, grandparent, etc.) of the current node and the current node itself
attribute	Selects all attributes of the current node	attribute::*	Selects all attributes of the current node
child	Selects all children of the current node	child::*/child::price	Selects all price grandchildren of the current node
descendant	Selects all descendants (children, grandchildren, etc.) of the current node	descendant::book	Selects all book descendants of the current node
descendant-or-self	Selects all descendants (children, grandchildren, etc.) of the current node and the current node itself
following	Selects everything in the document after the closing tag of the current node	following::text()	Selects all text node that are everything after the current node
following-sibling	Selects all siblings after the current node	following-sibling::node()	Selects all siblings after the current node
namespace	Selects all namespace nodes of the current node
parent	Selects the parent of the current node	parent::book	Selects all book nodes that are parent of the current node
preceding	Selects all nodes that are before the current node, except ancestors, attribute nodes and namespace nodes	preceding::price	Selects all price nodes that appear before the current node
preceding-sibling	Selects all siblings before the current node	preceding-sibling::book[price>50.0]	Selects book nodes with price>50 from siblings before the current node
self	Selects the current node	self::*	Selects all in the current node

Operator	Description	Example
\|	Computes two node-sets	//book \| //cd
+	Addition	6 + 4
-	Subtraction	6 - 4
*	Multiplication	6 * 4
div	Division	8 div 4
=	Equal	price=9.80
!=	Not equal	price!=9.80
<	Less than	price<9.80
<=	Less than or equal to	price<=9.80
>	Greater than	price>9.80
>=	Greater than or equal to	price>=9.80
or	or	price=9.80 or price=9.70
and	and	price>9.00 and price<9.90
mod	Modulus (division remainder)	5 mod 2

XPATH Cheat Sheet