itext.styledxmlparser

The Constant CLASS.

The Constant DISABLED.

The Constant ID.

The Constant LANG.

The Constant REL.

The Constant STYLESHEET.

Comparator class used to sort CSS rule set objects.

The selector comparator.

Class containing possible CSS property keys and values, pseudo element keys, units of measurement, and so on.

The Constant BACKGROUND.

The Constant BACKGROUND_ATTACHMENT.

The Constant BACKGROUND_BLEND_MODE.

The Constant BACKGROUND_CLIP.

The Constant BACKGROUND_COLOR.

The Constant BACKGROUND_IMAGE.

The Constant BACKGROUND_ORIGIN.

The Constant BACKGROUND_POSITION.

The Constant BACKGROUND_REPEAT.

The Constant BACKGROUND_SIZE.

The Constant BORDER.

The Constant BORDER_BOTTOM.

The Constant BORDER_BOTTOM_COLOR.

The Constant BORDER_BOTTOM_LEFT_RADIUS.

The Constant BORDER_BOTTOM_RIGHT_RADIUS.

The Constant BORDER_BOTTOM_STYLE.

The Constant BORDER_BOTTOM_WIDTH.

The Constant BORDER_COLLAPSE.

The Constant BORDER_COLOR.

The Constant BORDER_IMAGE.

The Constant BORDER_LEFT.

The Constant BORDER_LEFT_COLOR.

The Constant BORDER_LEFT_STYLE.

The Constant BORDER_LEFT_WIDTH.

The Constant BORDER_RADIUS.

The Constant BORDER_RIGHT.

The Constant BORDER_RIGHT_COLOR.

The Constant BORDER_RIGHT_STYLE.

The Constant BORDER_RIGHT_WIDTH.

The Constant BORDER_SPACING.

The Constant BORDER_STYLE.

The Constant BORDER_TOP.

The Constant BORDER_TOP_COLOR.

The Constant BORDER_TOP_LEFT_RADIUS.

The Constant BORDER_TOP_RIGHT_RADIUS.

The Constant BORDER_TOP_STYLE.

The Constant BORDER_TOP_WIDTH.

The Constant BORDER_WIDTH.

The Constant BOX_SHADOW.

The Constant CAPTION_SIDE.

The Constant COLOR.

The Constant DIRECTION.

The Constant DISPLAY.

The Constant EMPTY_CELLS.

The Constant FLOAT.

The Constant FONT.

The Constant FONT_FAMILY.

The Constant FONT_FEATURE_SETTINGS.

The Constant FONT_KERNING.

The Constant FONT_LANGUAGE_OVERRIDE.

The Constant FONT_SIZE.

The Constant FONT_SIZE_ADJUST.

The Constant FONT_STRETCH.

The Constant FONT_STYLE.

The Constant FONT_SYNTHESIS.

The Constant FONT_VARIANT.

The Constant FONT_VARIANT_ALTERNATES.

The Constant FONT_VARIANT_CAPS.

The Constant FONT_VARIANT_EAST_ASIAN.

The Constant FONT_VARIANT_LIGATURES.

The Constant FONT_VARIANT_NUMERIC.

The Constant FONT_VARIANT_POSITION.

The Constant FONT_WEIGHT.

The Constant HANGING_PUNCTUATION.

The Constant HYPHENS.

The Constant INLINE-BLOCK

The Constant LETTER_SPACING.

The Constant LINE_HEIGHT.

The Constant LIST_STYLE.

The Constant LIST_STYLE_IMAGE.

The Constant LIST_STYLE_POSITION.

The Constant LIST_STYLE_TYPE.

The Constant MARGIN.

The Constant MARGIN_BOTTOM.

The Constant MARGIN_LEFT.

The Constant MARGIN_RIGHT.

The Constant MARGIN_TOP.

The Constant MIN_HEIGHT.

The Constant OPACITY.

The Constant OPRPHANS.

The Constant OUTLINE.

The Constant OUTLINE_COLOR.

The Constant OUTLINE_STYLE.

The Constant OUTLINE_WIDTH.

The Constant OVERFLOW_WRAP.

The Constant OVERFLOW.

The Constant PADDING.

The Constant PADDING_BOTTOM.

The Constant PADDING_LEFT.

The Constant PADDING_RIGHT.

The Constant PADDING_TOP.

The Constant PAGE_BREAK_AFTER.

The Constant PAGE_BREAK_BEFORE.

The Constant PAGE_BREAK_INSIDE.

The Constant POSITION.

The Constant QUOTES.

The Constant TAB_SIZE.

The Constant TEXT_ALIGN.

The Constant TEXT_ALIGN_LAST.

The Constant TEXT_COMBINE_UPRIGHT.

The Constant TEXT_DECORATION.

The Constant TEXT_DECORATION_LINE.

The Constant TEXT_DECORATION_STYLE.

The Constant TEXT_DECORATION_COLOR.

The Constant TEXT_INDENT.

The Constant TEXT_JUSTIFY.

The Constant TEXT_ORIENTATION.

The Constant TEXT_SHADOW.

The Constant TEXT_TRANSFORM.

The Constant TEXT_UNDERLINE_POSITION.

The Constant TRANSFORM.

The Constant UNICODE_BIDI.

The Constant VISIBILITY.

The Constant WHITE_SPACE.

The Constant WIDOWS.

The Constant WIDTH.

The Constant HEIGHT.

The Constant WORDWRAP.

The Constant WORD_BREAK.

The Constant WORD_SPACING.

The Constant WRITING_MODE.

The Constant ALWAYS.

The Constant ARMENIAN.

The Constant AVOID.

The Constant AUTO.

The Constant BLINK.

The Constant BOLD.

The Constant BOLDER.

The Constant BORDER_BOX.

The Constant BOTTOM.

The Constant CAPTION.

The Constant CENTER.

The Constant CIRCLE.

The Constant CJK_IDEOGRAPHIC.

The Constant CLOSE_QUOTE.

The Constant CONTAIN.

The Constant CONTENT_BOX.

The Constant COVER.

The Constant CURRENTCOLOR.

The Constant DASHED.

The Constant DECIMAL.

The Constant DECIMAL_LEADING_ZERO.

The Constant DEG.

The Constant DISC.

The Constant DOTTED.

The Constant DOUBLE.

The Constant FIXED.

The Constant GEORGIAN.

The Constant GRAD.

The Constant GROOVE.

The Constant HEBREW.

The Constant HIDDEN.

The Constant HIRAGANA.

The Constant HIRAGANA_IROHA.

The Constant ICON.

The Constant INHERIT.

The Constant INITIAL.

The Constant INSET.

The Constant INSIDE.

The Constant ITALIC.

The Constant LARGE.

The Constant LARGER.

The Constant LEFT.

The Constant LIGHTER.

The Constant value LINE_THROUGH.

The Constant LOCAL.

The Constant LOWER_ALPHA.

The Constant LOWER_GREEK.

The Constant LOWER_LATIN.

The Constant LOWER_ROMAN.

The Constant MANUAL.

The Constant MATRIX.

The Constant MEDIUM.

The Constant MENU.

The Constant MESSAGE_BOX.

The Constant NO_OPEN_QUOTE.

The Constant NO_CLOSE_QUOTE.

The Constant NO_REPEAT.

The Constant NONE.

The Constant NORMAL.

The Constant OBLIQUE.

The Constant OPEN_QUOTE.

The Constant OUTSIDE.

The Constant OUTSET.

The Constant value OVERLINE.

The Constant PADDING_BOX.

The Constant RAD.

The Constant REPEAT.

The Constant REPEAT_X.

The Constant REPEAT_Y.

The Constant RIDGE.

The Constant RIGHT.

The Constant ROTATE.

The Constant SCALE.

The Constant SCALE_X.

The Constant SCALE_Y.

The Constant SCROLL.

The Constant SKEW.

The Constant SKEW_X.

The Constant SKEW_Y.

The Constant SMALL.

The Constant SMALL_CAPS.

The Constant SMALL_CAPTION.

The Constant SMALLER.

The Constant SOLID.

The Constant SQUARE.

The Constant START.

The Constant STATIC.

The Constant STATUS_BAR.

The Constant THICK.

The Constant THIN.

The Constant TOP.

The Constant TRANSLATE.

The Constant TRANSLATE_X.

The Constant TRANSLATE_Y.

The Constant TRANSPARENT.

The Constant value UNDERLINE

The Constant UPPER_ALPHA.

The Constant UPPER_LATIN.

The Constant UPPER_ROMAN.

The Constant value VISIBLE.

The Constant value WAVY.

The Constant X_LARGE.

The Constant X_SMALL.

The Constant XX_LARGE.

The Constant XX_SMALL.

The Constant BACKGROUND_SIZE_VALUES.

The Constant BACKGROUND_ORIGIN_OR_CLIP_VALUES.

The Constant BACKGROUND_REPEAT_VALUES.

The Constant BACKGROUND_ATTACHMENT_VALUES.

The Constant BACKGROUND_POSITION_VALUES.

The Constant BORDER_WIDTH_VALUES.

The Constant BORDER_STYLE_VALUES.

The Constant FONT_ABSOLUTE_SIZE_KEYWORDS.

The Constant METRIC_MEASUREMENTS.

The Constant ACTIVE.

The Constant CHECKED.

The Constant DISABLED.

The Constant EMPTY.

The Constant ENABLED.

The Constant FIRST_CHILD.

The Constant FIRST_OF_TYPE.

The Constant FOCUS.

The Constant HOVER.

The Constant IN_RANGE.

The Constant INVALID.

The Constant LANG.

The Constant LAST_CHILD.

The Constant LAST_OF_TYPE.

The Constant LINK.

The Constant NTH_CHILD.

The Constant NOT.

The Constant NTH_LAST_CHILD.

The Constant NTH_LAST_OF_TYPE.

The Constant NTH_OF_TYPE.

The Constant ONLY_OF_TYPE.

The Constant ONLY_CHILD.

The Constant OPTIONAL.

The Constant OUT_OF_RANGE.

The Constant READ_ONLY.

The Constant READ_WRITE.

The Constant REQUIRED.

The Constant ROOT.

The Constant TARGET.

The Constant VALID.

The Constant VISITED.

The Constant CM.

The Constant EM.

The Constant EX.

The Constant IN.

The Constant MM.

The Constant PC.

The Constant PERCENTAGE.

The Constant PT.

The Constant PX.

The Constant REM.

The Constant Q.

The Constant DPCM.

The Constant DPI.

The Constant DPPX.

Abstract superclass for all CSS at-rules (rules in CSS that start with an @ sign).

The rule name.

Creates a new instance.

the rule name

Gets the rule name.

the rule name

The CSS context node.

The child nodes.

The parent node.

The styles.

Creates a new instance.

the parent node

Class to store a CSS declaration.

The property.

The expression.

Instantiates a new CSS declaration.

the property the expression

Gets the property.

the property

Gets the expression.

the expression

Sets the expression.

the new expression

Class to store a CSS font face At rule.

Properties in the form of a list of CSS declarations.

Instantiates a new CSS font face rule.

the rule parameters

Gets the properties.

the properties

Class to store a nested CSS at-rule Nested at-rules are a subset of nested statements, which can be used as a statement of a style sheet as well as inside of conditional group rules.

The rule parameters.

The body.

Creates a instance with an empty body.

the rule name the rule parameters

Adds a CSS statement to body.

a CSS statement

Adds CSS statements to the body.

a list of CSS statements

Adds the body CSS declarations.

a list of CSS declarations

Gets the list of CSS statements.

the list of CSS statements

A factory for creating objects.

Creates a new instance.

Creates a new object.

the rule declaration a instance

Extracts the rule name from the CSS rule declaration.

the rule declaration the rule name

Class containing possible CSS rule names.

Creates a new instance.

The Constant BOTTOM_CENTER.

The Constant BOTTOM_LEFT.

The Constant BOTTOM_LEFT_CORNER.

The Constant BOTTOM_RIGHT.

The Constant BOTTOM_RIGHT_CORNER.

The Constant LEFT_BOTTOM.

The Constant LEFT_MIDDLE.

The Constant LEFT_TOP.

The Constant FONT_FACE.

The Constant MEDIA.

The Constant PAGE.

The Constant RIGHT_BOTTOM.

The Constant RIGHT_MIDDLE.

The Constant RIGHT_TOP.

The Constant TOP_CENTER.

The Constant TOP_LEFT.

The Constant TOP_LEFT_CORNER.

The Constant TOP_RIGHT.

The Constant TOP_RIGHT_CORNER.

Class to store a CSS rule set.

Pattern to match "important" in a rule declaration.

The CSS selector.

The normal CSS declarations.

The important CSS declarations.

Creates a new from selector and raw list of declarations.

Creates a new from selector and raw list of declarations. The declarations are split into normal and important under the hood. To construct the instance from normal and important declarations, see the CSS selector the CSS declarations

Gets the CSS selector.

the CSS selector

Gets the normal CSS declarations.

the normal declarations

Gets the important CSS declarations.

the important declarations

Split CSS declarations into normal and important CSS declarations.

the declarations

Comparator class used to sort CSS rule set objects.

The selector comparator.

A implementation.

The rule parameters.

Creates a new instance.

the rule declaration

Abstract superclass for all kinds of CSS statements.

Gets a list of objects.

a node a media device description the css rule sets

Class that stores all the CSS statements, and thus acts as a CSS style sheet.

The list of CSS statements.

Creates a new instance.

Adds a CSS statement to the style sheet.

the CSS statement

Append another CSS style sheet to this one.

the other CSS style sheet

Gets the CSS statements of this style sheet.

the CSS statements

Gets the CSS declarations.

the node the media device description the CSS declarations

Gets the CSS declarations.

list of css rule sets the CSS declarations

Populates the CSS declarations map.

the declarations the map

Gets the CSS rule sets.

the node the device description the css rule sets

Puts a declaration in a styles map if the declaration is valid.

the styles map the css declaration

Interface for CSS resolvers.

Resolves the styles of a node given the passed context.

the node the CSS context (RootFontSize, etc.) the map containing the resolved styles

The implementation for media rules.

The media queries.

Creates a .

the rule parameters

Tries to match a media device.

the device description true, if successful

Class that bundles all the values of a media device description.

The type.

The bits per component.

The color index.

The width in points.

The height in points.

Indicates if the media device is a grid.

The scan value.

The orientation.

The the number of bits per pixel on a monochrome (greyscale) device.

The resolution in DPI.

See class constants for possible values.

a type of the media to use.

Creates a new instance.

the type the width the height

Creates the default .

the media device description

Gets default instance.

Gets default instance. Do not modify any fields of the returned media device description because it may lead to unpredictable results. Use if you want to modify device description. the default media device description

Gets the type.

the type

Gets the bits per component.

the bits per component

Sets the bits per component.

the bits per component the media device description

Gets the color index.

the color index

Sets the color index.

the color index the media device description

Gets the width in points.

the width

Sets the width in points.

the width the media device description

Gets the height in points.

the height

Sets the height in points.

the height the media device description

Checks if the media device is a grid.

true, if is grid

Sets the grid value.

the grid value the media device description

Gets the scan value.

the scan value

Sets the scan value.

the scan value the media device description

Gets the orientation.

the orientation

Sets the orientation.

the orientation the media device description

Gets the number of bits per pixel on a monochrome (greyscale) device.

the number of bits per pixel on a monochrome (greyscale) device

Sets the number of bits per pixel on a monochrome (greyscale) device.

the number of bits per pixel on a monochrome (greyscale) device the media device description

Gets the resolution in DPI.

the resolution

Sets the resolution in DPI.

the resolution the media device description

Class that bundles all the media expression properties.

The default font size.

Indicates if there's a "min-" prefix.

Indicates if there's a "max-" prefix.

The feature.

The value.

Creates a new instance.

the feature the value

Tries to match a .

the device description true, if successful

Parses an absolute length.

the absolute length as a value the absolute length as a float value

Class that bundles all the media feature values.

Creates a new instance.

Value: <integer>
Media: visual
Accepts min/max prefixes: yes
Indicates the number of bits per color component of the output device.

Value: <integer>
Media: visual
Accepts min/max prefixes: yes
Indicates the number of bits per color component of the output device. If the device is not a color device, this value is zero.

Value: <integer>
Media: visual
Accepts min/max prefixes: yes
Indicates the number of entries in the color look-up table for the output device.

Value: <ratio>
Media: visual, tactile
Accepts min/max prefixes: yes
Describes the aspect ratio of the targeted display area of the output device.

Value: <ratio>
Media: visual, tactile
Accepts min/max prefixes: yes
Describes the aspect ratio of the targeted display area of the output device. This value consists of two positive integers separated by a slash ("/") character. This represents the ratio of horizontal pixels (first term) to vertical pixels (second term).

Value: <mq-boolean> which is an <integer> that can only have the 0 and 1 value.
Media: all
Accepts min/max prefixes: no
Determines whether the output device is a grid device or a bitmap device.

Value: <mq-boolean> which is an <integer> that can only have the 0 and 1 value.
Media: all
Accepts min/max prefixes: no
Determines whether the output device is a grid device or a bitmap device. If the device is grid-based (such as a TTY terminal or a phone display with only one font), the value is 1. Otherwise it is zero.

Value: progressive | interlace
Media: tv
Accepts min/max prefixes: no
Describes the scanning process of television output devices.

Value: landscape | portrait
Media: visual
Accepts min/max prefixes: no
Indicates whether the viewport is in landscape (the display is wider than it is tall) or portrait (the display is taller than it is wide) mode.

Value: <integer>
Media: visual
Accepts min/max prefixes: yes
Indicates the number of bits per pixel on a monochrome (greyscale) device.

Value: <integer>
Media: visual
Accepts min/max prefixes: yes
Indicates the number of bits per pixel on a monochrome (greyscale) device. If the device isn't monochrome, the device's value is 0.

Value: <length>
Media: visual, tactile
Accepts min/max prefixes: yes
The height media feature describes the height of the output device's rendering surface (such as the height of the viewport or of the page box on a printer).

Value: <resolution>
Media: bitmap
Accepts min/max prefixes: yes
Indicates the resolution (pixel density) of the output device.

Value: <resolution>
Media: bitmap
Accepts min/max prefixes: yes
Indicates the resolution (pixel density) of the output device. The resolution may be specified in either dots per inch (dpi) or dots per centimeter (dpcm).

Value: <length>
Media: visual, tactile
Accepts min/max prefixes: yes
The width media feature describes the width of the rendering surface of the output device (such as the width of the document window, or the width of the page box on a printer).

Class that bundles all the media query properties.

The logical "only" value.

The logical "not" value.

The type.

The expressions.

Creates a new instance.

the type the expressions logical "only" value logical "not" value

Tries to match a device description with the media query.

the device description true, if successful

Utilities class that parses values into or values.

Creates a instance.

Parses a into a of values.

the media queries in the form of a the resulting of values

Parses a into a value.

the media query in the form of a the resulting value

Parses a into a list of values.

the media expressions in the form of a indicates if the media expression shall start with "and" the resulting list of values

Parses a into a value.

the media expression in the form of a the resulting value

Class that bundles a series of media rule constants.

Creates a new instance.

The Constant AND.

The Constant MIN.

The Constant MAX.

The Constant NOT.

The Constant ONLY.

Class that bundles all the media types and allows you to registered valid media types in a .

The Constant registeredMediaTypes.

The Constant ALL.

The Constant AURAL.

The Constant BRAILLE.

The Constant EMBOSSED.

The Constant HANDHELD.

The Constant PRINT.

The Constant PROJECTION.

The Constant SCREEN.

The Constant SPEECH.

The Constant TTY.

The Constant TV.

Creates a new instance.

Checks if a media type is registered as a valid media type.

the media type true, if it's a valid media type

Registers a media type.

the media type the string

implementation for margins.

The page selectors.

Creates a new instance.

the rule name

Creates a new instance.

the rule name the rule parameters

Sets the page selectors.

the new page selectors

Class for a non standard .

Creates a new instance.

the selector the declarations

implementation for page rules.

The page selectors.

Creates a new instance.

the rule parameters

Class that bundles a series of page context constants.

The Constant BLANK.

The Constant FIRST.

The Constant LEFT.

The Constant RIGHT.

implementation for page contexts.

The page type name.

The page classes.

Creates a new instance.

the parent node

Adds a page class.

the page class the page context node

Gets the page type name.

the page type name

Sets the page type name.

the page type name the page context node

Gets the list of page classes.

the page classes

implementation for page margin box contexts.

The Constant PAGE_MARGIN_BOX_TAG.

The margin box name.

Creates a new instance.

the parent node the margin box name

Gets the margin box name.

the margin box name

Sets the rectangle in which page margin box contents are shown.

the defining position and dimensions of the margin box content area

Gets the rectangle in which page margin box contents should be shown.

the defining position and dimensions of the margin box content area

Sets the containing block rectangle for the margin box, which is used for calculating some of the margin box properties relative values.

the which is used as a reference for some margin box relative properties calculations. the which is used as a reference for some margin box relative properties calculations.

Tokenizer for CSS declaration values.

The source string.

The current index.

The quote string, either "'" or "\"".

Indicates if we're inside a string.

The depth.

Creates a new instance.

the property value

Gets the next valid token.

the next valid token

Gets the next token.

the next token

Checks if a character is a hexadecimal digit.

the character true, if it's a hexadecimal digit

Processes a function token.

the token the function buffer

The Token class.

The value.

The type.

Creates a new instance.

the value the type

Gets the value.

the value

Gets the type.

the type

Checks if the token is a string.

true, if is string

Enumeration of the different token types.

The string type.

The function type.

The comma type.

Unknown type.

Utilities class to parse CSS page selectors.

The pattern string for page selectors.

The pattern for page selectors.

Parses the selector items into a list of instances.

the selector items in the form of a the resulting list of instances

Utilities class to parse CSS rule sets.

The logger.

Creates a new instance.

Parses property declarations.

the property declarations in the form of a the list of instances

Parses a rule set into a list of instances.

Parses a rule set into a list of instances. This method returns a because a selector can be compound, like "p, div, #navbar". the selector the properties the resulting list of instances

Splits CSS properties into an array of values.

the properties the array of property values

Gets the semicolon position.

the properties the from index the semicolon position

Utilities class to parse a CSS selector.

Set of legacy pseudo elements (first-line, first-letter, before, after).

The pattern string for selectors.

The pattern for selectors.

Creates a new instance.

Parses the selector items.

the selectors in the form of a the resulting list of

Resolves a pseudo selector, appends it to list and updates in process.

list of items to which new selector will be added to the pseudo selector the corresponding that will be updated.

Internal class not for public use.

Internal class not for public use. Its API may change.

Get the index at which the last match started

Get the text of the last match

Get the source text being matched

Return whether or not the match was successful

Attempt to match the pattern against the next piece of the source text

Get the index at which the next match of the pattern takes place

the index at which to start matching the pattern

Utilities class to parse a CSS style sheet.

Creates a new .

Parses a stream into a .

the stream the base url the resulting

Parses a stream into a .

the stream the resulting

Parses a string into a .

the style sheet data the base url the resulting

Parses a string into a .

the data the resulting

implementation for the At-rule state.

The state machine that parses the CSS.

Creates a new instance.

the state machine that parses the CSS

implementation for the end comment state.

The state machine that parses the CSS.

Creates a new instance.

the state machine that parses the CSS

implementation for the inner comment state.

The state machine that parses the CSS.

Creates a new instance.

the state machine that parses the CSS

implementation for the start comment state.

The state machine that parses the CSS.

Creates a new instance.

the state machine that parses the CSS

implementation for the conditional group At-rule state.

The state machine that parses the CSS.

Creates a new instance.

the state machine that parses the CSS

State machine that will parse content into a style sheet.

The current state.

Indicates if the current rule is supported.

The previous active state (excluding comments).

A buffer to store temporary results.

The current selector.

The style sheet.

The nested At-rules.

The stored properties without selector.

Set of the supported rules.

Set of conditional group rules.

The comment start state.

The commend end state.

The commend inner state.

The unknown state.

The rule state.

The properties state.

The conditional group at rule block state.

The At-rule block state.

The URI resolver.

Creates a new instance.

the base URL

Process a character using the current state.

the character

Gets the resulting style sheet.

the resulting style sheet

Appends a character to the buffer.

the character

Gets the contents of the buffer.

the buffer contents

Resets the buffer.

Enter the previous active state.

Enter the comment start state.

Enter the comment end state.

Enter the comment inner state.

Enter the rule state.

Enter the unknown state if nested blocks are finished.

Enter the rule state, based on whether the current state is unsupported or conditional.

Enter the unknown state.

Enter the At-rule block state.

Enter the conditional group At-rule block state.

Enter the properties state.

Store the current selector.

Store the current properties.

Store the current properties without selector.

Store the semicolon At-rule.

Finish the At-rule block.

Push the block preceding At-rule.

Save the active state.

Sets the current state.

the new state

Processes the properties.

the selector the properties

Processes the properties.

the properties

Normalizes the declaration URIs.

the declarations

Processes the semicolon At-rule.

the rule str

Processes the finished At-rule block.

the at rule

Checks if is current rule is supported.

true, if the current rule is supported

Checks if the current At-rule is a conditional group rule (or if it's unsupported).

true, if the current At-rule is unsupported or conditional

Interface for all parser state implementations.

Process a character.

the character

implementation for the block state.

The state machine that parses the CSS.

Creates a new instance.

the state machine that parses the CSS

implementation for the rule state.

The state machine that parses the CSS.

Creates a new instance.

the state machine that parses the CSS

implementation for the unknown state.

The state machine that parses the CSS.

Creates a new instance.

the state machine that parses the CSS

implementation for pseudo elements.

The pseudo element name.

The pseudo element tag name.

Creates a new instance.

the parent node the pseudo element name

Gets the pseudo element name.

the pseudo element name

A simple implementation.

Utilities class for pseudo elements.

The prefix for pseudo elements.

Creates the pseudo element tag name.

the pseudo element name the tag name

Checks for before or after elements.

the node true, if successful

Container for CSS context properties that influence CSS resolution.

Container for CSS context properties that influence CSS resolution. This class only contains properties relevant for any generic XML+CSS combo: specific properties must be implemented in a project-specific subclass. Used by .

The quotes depth.

Gets the quotes depth.

the quotes depth

Sets the quotes depth.

the new quotes depth

implementation for content nodes.

The attributes

The tag name

Creates a new instance.

the parent node the pseudo element name the attributes

Simple implementation.

The attributes.

Creates a new instance.

the attributes

Simple implementation.

The entry.

Creates a new instance.

the entry

iterator.

The iterator.

Creates a new instance.

the iterator

Helper class that allows you to get the default values of CSS properties.

A map with properties and their default values.

Gets the default value of a property.

the property the default value

Helper class that allows you to check if a property is inheritable.

Set of inheritable properties in accordance with "http://www.w3schools.com/cssref/" and "https://developer.mozilla.org/en-US/docs/Web/CSS/Reference"

Checks if a property is inheritable.

the CSS property true, if the property is inheritable

Utilities class to merge CSS properties.

Creates a new class.

Merges text decoration.

the first value the second value the merged value

Normalizes text decoration values.

the text decoration value a set of normalized decoration values

Helper class to deal with quoted values in strings.

The empty quote value.

The open quotes.

The close quotes.

Creates a new instance.

the open quotes the close quotes

Creates a instance.

the quotes string indicates whether it's OK to fall back to the default the resulting instance

Creates the default instance.

the instance

Resolves quotes.

the value the CSS context the quote string

Increases the quote depth.

the context

Decreases the quote depth.

the context

Gets the quote.

the depth the quotes the requested quote string

Interface for attribute and style-inheritance logic

Checks if a property or attribute is inheritable is inheritable.

the identifier for property true, if the property is inheritable, false otherwise

Abstract implementation for borders.

The template for -width properties.

The template for -style properties.

The template for -color properties.

Gets the prefix of a property.

the prefix

Abstract implementation for box definitions.

The template for -left properties.

The template for -right properties.

The template for -bottom properties.

The template for -top properties.

Gets the prefix of a property.

the prefix

Gets the postfix of a property.

the postfix

Abstract implementation for corners definitions.

The template for -bottom-left properties.

The template for -bottom-right properties.

The template for -top-left properties.

The template for -top-right properties.

Gets the prefix of a property.

the prefix

Gets the postfix of a property.

the postfix

implementation for backgrounds.

The Constant UNDEFINED_TYPE.

The Constant BACKGROUND_COLOR_TYPE.

The Constant BACKGROUND_IMAGE_TYPE.

The Constant BACKGROUND_POSITION_TYPE.

The Constant BACKGROUND_POSITION_OR_SIZE_TYPE.

The Constant BACKGROUND_REPEAT_TYPE.

The Constant BACKGROUND_ORIGIN_OR_CLIP_TYPE.

The Constant BACKGROUND_CLIP_TYPE.

The Constant BACKGROUND_ATTACHMENT_TYPE.

Resolves the property type.

the value the property type value

Registers a property based on its type.

the property type the property value the resolved properties indicates whether a slash was encountered

implementation for bottom borders.

implementation for border colors.

implementation for left borders.

implementation for border radius.

implementation for right borders.

implementation for borders.

implementation for border styles.

implementation for top borders.

implementation for border widths.

implementation for fonts.

Unsupported shorthand values.

Font weight values.

Font size values.

Gets the font properties.

the shorthand expression the font properties

implementation for list styles.

The list style types (disc, decimal,...).

The list style positions (inside, outside).

implementation for margins.

implementation for outlines.

implementation for paddings.

Interface for shorthand resolvers.

Resolves a shorthand expression.

the shorthand expression a list of CSS declaration

A factory for creating ShorthandResolver objects.

The map of shorthand resolvers.

Gets a shorthand resolver.

the property the shorthand resolver

Abstract superclass for CSS Selectors.

The selector items.

Creates a new instance.

the selector items

implementation for CSS page margin box selectors.

The page margin box name.

The page selector.

Creates a new instance.

the page margin box name the page selector

implementation for CSS page selectors.

Creates a new instance.

the page selector

implementation for CSS selectors.

Creates a new instance.

the selector items

Creates a new instance.

the selector

Checks if a node matches the selector.

the node the index of the last selector true, if there's a match

Comparator class for CSS Selectors.

Interface for CSS Selector classes.

Calculates the specificity.

the specificity

Checks if a node matches the selector.

the node true, if the selector is a match for the node

implementation for attribute selectors.

The property.

The match symbol.

The value.

Creates a new instance.

the attribute

implementation for class selectors.

The class name.

Creates a new instance.

the class name

implementation for id selectors.

The id.

Creates a new instance.

the id

implementation for page pseudo classes selectors.

Indicates if the page pseudo class is a spread pseudo class (left or right).

The page pseudo class.

Creates a new instance.

the page pseudo class name

implementation for page type selectors.

The page type name.

Creates a new instance.

the page type name

Creates a new instance.

the pseudo class name

Gets the all the siblings of a child node.

the child node the sibling nodes

Gets all siblings of a child node with the type of a child node.

the child node the sibling nodes with the type of a child node

The nth A.

The nth B.

Gets the nth arguments.

Resolves the nth.

a node the children true, if successful

implementation for pseudo class selectors.

The arguments.

The pseudo class.

Creates a new instance.

the pseudo class name

implementation for pseudo element selectors.

The pseudo element name.

Creates a new instance.

the pseudo element name

implementation for separator selectors.

The separator character.

Creates a new instance.

the separator character

Gets the separator character.

the separator character

Class that bundles some CSS specificity constants.

Creates a new instance.

The Constant ID_SPECIFICITY.

The Constant CLASS_SPECIFICITY.

The Constant ELEMENT_SPECIFICITY.

implementation for tag selectors.

The tag name.

Indicates if the selector is universally valid.

Creates a new instance.

the tag name

Interface for CSS selector items.

Gets the specificity.

the specificity

Checks if the selector matches an element.

the element true, if there's a match

Utilities class for CSS gradient functions parsing

Checks whether the provided value is a linear gradient or repeating linear gradient function.

Checks whether the provided value is a linear gradient or repeating linear gradient function. This method does not check the validity of arguments list. the value to check if the provided argument is the linear gradient or repeating linear gradient function (even if the arguments list is invalid)

Parses the provided linear gradient or repeating linear gradient function

the value to parse the current element's em value the current element's rem value the constructed from the parsed linear gradient or if the argument value is not a linear gradient or repeating linear gradient function

Utilities class with functionality to normalize CSS properties.

Normalize a property.

the property the normalized property

Appends quoted string.

the current buffer a source where to start in the source. Should point at quote symbol. the new position in the source

Appends url content and end parenthesis if url is correct.

the current buffer a source where to start in the source. Should point at first symbol after "url(". the new position in the source

Checks if spaces can be trimmed after a specific character.

the character true, if spaces can be trimmed after the character

Checks if spaces can be trimmed before a specific character.

the character true, if spaces can be trimmed before the character

Utilities class for CSS operations.

Creates a new instance.

Extracts shorthand properties as list of string lists from a string, where the top level list is shorthand property and the lower level list is properties included in shorthand property.

the source string with shorthand properties the list of string lists

Normalizes a CSS property.

the property the normalized property

Removes double spaces and trims a string.

the string the string without the unnecessary spaces

Parses an integer without throwing an exception if something goes wrong.

a string that might be an integer value the integer value, or null if something went wrong

Parses a float without throwing an exception if something goes wrong.

a string that might be a float value the float value, or null if something went wrong

Parses a double without throwing an exception if something goes wrong.

a string that might be a double value the double value, or null if something went wrong

Parses an angle with an allowed metric unit (deg, grad, rad) or numeric value (e.g. 123, 1.23, .123) to rad.

String containing the angle to parse default metric to use in case the input string does not specify a metric the angle in radians

Parses a angle with an allowed metric unit (deg, grad, rad) or numeric value (e.g. 123, 1.23, .123) to rad.

Parses a angle with an allowed metric unit (deg, grad, rad) or numeric value (e.g. 123, 1.23, .123) to rad. Default metric is degrees String containing the angle to parse the angle in radians

Parses an aspect ratio into an array with two integers.

a string that might contain two integer values the aspect ratio as an array of two integer values

Parses a length with an allowed metric unit (px, pt, in, cm, mm, pc, q) or numeric value (e.g. 123, 1.23, .123) to pt.
A numeric value (without px, pt, etc in the given length string) is considered to be in the default metric that was given.

the string containing the length. the string containing the metric if it is possible that the length string does not contain one. If null the length is considered to be in px as is default in HTML/CSS. parsed value

Parses the absolute length.

the length as a string the length as a float

Parses an relative value based on the base value that was given, in the metric unit of the base value.
(e.g. margin=10% should be based on the page width, so if an A4 is used, the margin = 0.10*595.0 = 59.5f)

in %, em or ex. the value the returned float is based on. the parsed float in the metric unit of the base value.

Convenience method for parsing a value to pt.

Convenience method for parsing a value to pt. Possible values are: a numeric value in pixels (e.g. 123, 1.23, .123), a value with a metric unit (px, in, cm, mm, pc or pt) attached to it, or a value with a relative value (%, em, ex). the value the em value the root em value the unit value

Parses the absolute font size.

Parses the absolute font size. A numeric value (without px, pt, etc in the given length string) is considered to be in the default metric that was given. the font size value as a the string containing the metric if it is possible that the length string does not contain one. If null the length is considered to be in px as is default in HTML/CSS. the font size value as a float

Parses the absolute font size.

Parses the absolute font size. A numeric value (without px, pt, etc in the given length string) is considered to be in the px. the font size value as a the font size value as a float

Parses the relative font size.

the relative font size value as a the base value the relative font size value as a float

Parses the border radius of specific corner.

string that defines the border radius of specific corner. the em value the root em value an array of UnitValues that define horizontal and vertical border radius values

Parses the resolution.

the resolution as a string a value in dpi

Method used in preparation of splitting a string containing a numeric value with a metric unit (e.g. 18px, 9pt, 6cm, etc).

Determines the position between digits and affiliated characters ('+','-','0-9' and '.') and all other characters.
e.g. string "16px" will return 2, string "0.5em" will return 3 and string '-8.5mm' will return 4.

containing a numeric value with a metric unit int position between the numeric value and unit or 0 if string is null or string started with a non-numeric value.

Checks whether a string contains an allowed metric unit in HTML/CSS; px, in, cm, mm, pc, Q or pt.

the string that needs to be checked. boolean true if value contains an allowed metric value.

Checks whether a string contains an allowed metric unit in HTML/CSS; rad, deg and grad.

the string that needs to be checked. boolean true if value contains an allowed angle value.

Checks whether a string contains an allowed value relative to previously set value.

the string that needs to be checked. boolean true if value contains an allowed metric value.

Checks whether a string contains an allowed value relative to font.

the string that needs to be checked. boolean true if value contains an allowed font relative value.

Checks whether a string contains a percentage value

the string that needs to be checked boolean true if value contains an allowed percentage value

Checks whether a string contains an allowed value relative to previously set root value.

the string that needs to be checked. boolean true if value contains a rem value.

Checks whether a string contains an allowed value relative to parent value.

the string that needs to be checked. boolean true if value contains a em value.

Checks whether a string contains an allowed value relative to element font height.

the string that needs to be checked. boolean true if value contains a ex value.

Checks whether a string matches a numeric value (e.g. 123, 1.23, .123).

Checks whether a string matches a numeric value (e.g. 123, 1.23, .123). All these metric values are allowed in HTML/CSS. the string that needs to be checked. boolean true if value contains an allowed metric value.

Parses url("file.jpg") to file.jpg.

the url attribute to parse the parsed url. Or original url if not wrappend in url()

Checks if a data is base 64 encoded.

the data true, if the data is base 64 encoded

Find the next unescaped character.

a source the character to look for where to start looking the position of the next unescaped character

Checks if a value is a color property.

the value true, if the value contains a color property

Helper method for comparing floating point numbers

first float to compare second float to compare True if both floats are equal within a Epsilon defined in this class, false otherwise

Helper method for comparing floating point numbers

first float to compare second float to compare True if both floats are equal within a Epsilon defined in this class, false otherwise

Parses the RGBA color.

the color value an RGBA value expressed as an array with four float values

Parses the unicode range.

the string which stores the unicode range the unicode range as a object

Convert given point value to a pixel value.

Convert given point value to a pixel value. 1 px is 0.75 pts. float value to be converted to pixels float converted value pts/0.75f

Convert given point value to a pixel value.

Convert given point value to a pixel value. 1 px is 0.75 pts. double value to be converted to pixels double converted value pts/0.75

Convert given point value to a point value.

Convert given point value to a point value. 1 px is 0.75 pts. float value to be converted to pixels float converted value px*0.75

Convert given point value to a point value.

Convert given point value to a point value. 1 px is 0.75 pts. double value to be converted to pixels double converted value px*0.75

Class that bundles all the CSS declaration validators.

A map containing all the CSS declaration validators.

Creates a new CssDeclarationValidationMaster instance.

Checks a CSS declaration.

the CSS declaration true, if the validation was successful

Interface for CSS data type validators.

Checks if a value is a valid data type (e.g. a color, an identifier,...).

the value true, if the value is a valid data type

Interface for CSS declaration validators.

Checks if a value is a valid CSS declaration.

the CSS declaration true, if the value is a valid CSS declaration

implementation for colors.

implementation for elements in an enumeration.

The allowed values.

Creates a new instance.

the allowed values

implementation for identifiers.

implementation for identifiers. In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B\26 W\3F".

implementation for values for the CSS quotes key.

implementation for css transform property .

implementation in case multiple types have to be checked.

The allowed data types.

Creates a new instance.

the allowed types

implementation to validate a single type.

The data type validator.

Creates a new instance.

the data type validator

Runtime exception that gets thrown if something goes wrong in the HTML to PDF conversion.

The Constant INVALID_GRADIENT_VALUE.

The Constant INVALID_GRADIENT_TO_SIDE_OR_CORNER_STRING.

The Constant INVALID_GRADIENT_COLOR_STOP_VALUE.

The Constant NAN.

Message in case the font provider doesn't know about any fonts.

The Constant UnsupportedEncodingException.

Creates a new instance.

the message

Interface for the XML parsing operations that accept XML and return a document node.

Parses XML provided as an InputStream and an encoding.

the Xml stream the character set. If then parser should detect encoding from stream. a document node

Parses XML provided as a String.

the Xml string a document node

Internal static utilities for handling data.

Loads a file to a Document.

file to load character set of input base URI of document, to resolve relative links against Document

Parses a Document from an input steam.

input stream to parse. You will need to close it. character set of input base URI of document, to resolve relative links against Document

Parses a Document from an input steam, using the provided Parser.

input stream to parse. You will need to close it. character set of input base URI of document, to resolve relative links against alternate parser to use. Document

Writes the input stream to the output stream.

Writes the input stream to the output stream. Doesn't close them. input stream to read from output stream to write to

Read the input stream into a byte buffer.

the input stream to read from the maximum size in bytes to read from the stream. Set to 0 to be unlimited. the filled byte buffer

Parse out a charset from a content type header.

Parse out a charset from a content type header. If the charset is not supported, returns null (so the default will kick in.) e.g. "text/html; charset=EUC-JP" "EUC-JP", or null if not found. Charset is trimmed and uppercased.

Creates a random string, suitable for use as a mime boundary

Provides a descending iterator and other 1.6 methods to allow support on the 1.5 JRE.

Create a new DescendableLinkedList.

Look at the last element, if there is one.

the last element, or null

Remove and return the last element, if there is one

the last element, or null

Get an iterator that starts and the end of the list and works towards the start.

an iterator that starts and the end of the list and works towards the start.

Check if there is another element on the list.

if another element

Get the next element.

the next element.

Remove the current element.

A minimal String utility class.

A minimal String utility class. Designed for internal jsoup use only.

Join a collection of strings by a seperator

collection of string objects string to place between strings joined string

Join a collection of strings by a seperator

iterator of string objects string to place between strings joined string

Returns space padding

amount of padding desired string of spaces * width

Tests if a string is blank: null, emtpy, or only whitespace (" ", \r\n, \t, etc)

string to test if string is blank

Tests if a string is numeric, i.e.

Tests if a string is numeric, i.e. contains only digit characters string to test true if only digit chars, false if empty or null or contains non-digit chrs

Tests if a code point is "whitespace" as defined in the HTML spec.

code point to test true if code point is whitespace, false otherwise

Normalise the whitespace within this string; multiple spaces collapse to a single, and all whitespace characters (e.g.

Normalise the whitespace within this string; multiple spaces collapse to a single, and all whitespace characters (e.g. newline, tab) convert to a simple space content to normalise normalised string

After normalizing the whitespace within a string, appends it to a string builder.

builder to append to string to normalize whitespace within set to true if you wish to remove any leading whitespace

Create a new absolute URL, from a provided existing absolute URL and a relative URL component.

the existing absolulte base URL the relative URL to resolve. (If it's already absolute, it will be returned) the resolved absolute URL

Create a new absolute URL, from a provided existing absolute URL and a relative URL component.

the existing absolute base URL the relative URL to resolve. (If it's already absolute, it will be returned) an absolute URL if one was able to be generated, or the empty string if not

Simple validation methods.

Simple validation methods. Designed for jsoup internal use

Validates that the object is not null

object to test

Validates that the object is not null

object to test message to output if validation fails

Validates that the value is true

object to test

Validates that the value is true

object to test message to output if validation fails

Validates that the value is false

object to test

Validates that the value is false

object to test message to output if validation fails

Validates that the array contains no null elements

the array to test

Validates that the array contains no null elements

the array to test message to output if validation fails

Validates that the string is not empty

the string to test

Validates that the string is not empty

the string to test message to output if validation fails

Cause a failure.

message to output.

Signals that a HTTP request resulted in a not OK HTTP response.

The core public access point to the jsoup functionality.

Jonathan Hedley

Parse HTML into a Document.

Parse HTML into a Document. The parser will make a sensible, balanced document tree out of any HTML. HTML to parse The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag. sane HTML

Parse HTML into a Document, using the provided Parser.

Parse HTML into a Document, using the provided Parser. You can provide an alternate parser, such as a simple XML (non-HTML) parser. HTML to parse The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag. alternate parser to use. sane HTML

Parse HTML into a Document.

Parse HTML into a Document. As no base URI is specified, absolute URL detection relies on the HTML including a <base href> tag. HTML to parse sane HTML

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML. XML to parse The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag. sane XML

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML. XML to parse sane XML

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML. input stream to read. Make sure to close it after parsing. (optional) character set of file contents. Set to to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). The URL where the HTML was retrieved from, to resolve relative links against. sane XML

Parse XML into a Document.

Parse the contents of a file as HTML.

file to load HTML from (optional) character set of file contents. Set to to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). The URL where the HTML was retrieved from, to resolve relative links against. sane HTML

Parse the contents of a file as HTML.

Parse the contents of a file as HTML. The location of the file is used as the base URI to qualify relative URLs. file to load HTML from (optional) character set of file contents. Set to to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). sane HTML

Read an input stream, and parse it to a Document.

input stream to read. Make sure to close it after parsing. (optional) character set of file contents. Set to to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). The URL where the HTML was retrieved from, to resolve relative links against. sane HTML

Read an input stream, and parse it to a Document.

Read an input stream, and parse it to a Document. You can provide an alternate parser, such as a simple XML (non-HTML) parser. input stream to read. Make sure to close it after parsing. (optional) character set of file contents. Set to to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). The URL where the HTML was retrieved from, to resolve relative links against. alternate parser to use. sane HTML

Parse a fragment of HTML, with the assumption that it forms the body of the HTML.

body HTML fragment URL to resolve relative URLs against. sane HTML document

Parse a fragment of HTML, with the assumption that it forms the body of the HTML.

body HTML fragment sane HTML document

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

input untrusted HTML (body fragment) URL to resolve relative URLs against white-list of permitted HTML elements safe HTML (body fragment)

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

input untrusted HTML (body fragment) white-list of permitted HTML elements safe HTML (body fragment)

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

input untrusted HTML (body fragment) URL to resolve relative URLs against white-list of permitted HTML elements document output settings; use to control pretty-printing and entity escape modes safe HTML (body fragment)

Test if the input HTML has only tags and attributes allowed by the Whitelist.

Test if the input HTML has only tags and attributes allowed by the Whitelist. Useful for form validation. The input HTML should still be run through the cleaner to set up enforced attributes, and to tidy the output. HTML to test whitelist to test against true if no tags or attributes were removed; false otherwise

A single key + value attribute.

A single key + value attribute. Keys are trimmed and normalised to lower-case. Jonathan Hedley, jonathan@hedley.net

Create a new attribute from unencoded (raw) key and value.

attribute key attribute value

Get the attribute key.

the attribute key

Set the attribute key.

Set the attribute key. Gets normalised as per the constructor method. the new key; must not be null

Get the attribute value.

the attribute value

Set the attribute value.

the new attribute value; must not be null

Get the HTML representation of this attribute; e.g.

Get the HTML representation of this attribute; e.g. href="index.html" . HTML

Get the string representation of this attribute, implemented as .

string

Create a new Attribute from an unencoded key and a HTML attribute encoded value.

assumes the key is not encoded, as can be only run of simple \w chars. HTML attribute encoded value attribute

Collapsible if it's a boolean attribute and value is empty or same as name

Outputsettings Returns whether collapsible or not

Get an attribute value by key.

the attribute key the attribute value if set; or empty string if not set.

Set a new attribute, or replace an existing one by key.

attribute key attribute value

Set a new boolean attribute, remove attribute if value is false.

attribute key attribute value

Set a new attribute, or replace an existing one by key.

attribute

Remove an attribute by key.

attribute key to remove

Tests if these attributes contain an attribute with this key.

key to check for true if key exists, false otherwise

Get the number of attributes in this set.

size

Add all the attributes from the incoming set to this set.

attributes to add to these attributes.

Get the attributes as a List, for iteration.

Get the attributes as a List, for iteration. Do not modify the keys of the attributes via this view, as changes to keys will not be recognised in the containing set. an view of the attributes as a List.

Retrieves a filtered view of attributes that are HTML5 custom data attributes; that is, attributes with keys starting with data- .

map of custom data attributes.

Get the HTML representation of these attributes.

HTML

Checks if these attributes are equal to another set of attributes, by comparing the two sets

attributes to compare with if both sets of attributes have the same content

Calculates the hashcode of these attributes, by iterating all attributes and summing their hashcodes.

calculated hashcode

A boolean attribute that is written out without any value.

Create a new boolean attribute from unencoded (raw) key.

attribute key

A comment node.

Jonathan Hedley, jonathan@hedley.net

Create a new comment node.

The contents of the comment base URI

Get the contents of the comment.

comment content

A data node, for contents of style, script tags etc, where contents should not show in text().

Jonathan Hedley, jonathan@hedley.net

Create a new DataNode.

data contents base URI

Get the data contents of this node.

Get the data contents of this node. Will be unescaped and with original new lines, space etc. data

Set the data contents of this node.

unencoded data this node, for chaining

Create a new DataNode from HTML encoded data.

encoded data bass URI new DataNode

A HTML Document.

Jonathan Hedley, jonathan@hedley.net

Create a new, empty Document.

base URI of document

Create a valid, empty shell of a document, suitable for adding more elements to.

baseUri of document document with html, head, and body elements.

Get the URL this Document was parsed from.

Get the URL this Document was parsed from. If the starting URL is a redirect, this will return the final URL from which the document was served from. location

Accessor to the document's head element.

head

Accessor to the document's body element.

body

Get the string contents of the document's title element.

Trimmed title, or empty string if none set.

Set the document's element.

Set the document's element. Updates the existing element, or adds to head if not present string to set as title

Create a new Element, with this document's base uri.

Create a new Element, with this document's base uri. Does not make the new element a child of this document. element tag name (e.g. a ) new element

Normalise the document.

Normalise the document. This happens after the parse phase so generally does not need to be called. Moves any text content that is not in the body element into the body. this document after normalisation

Set the text of the body of this document.

Set the text of the body of this document. Any existing nodes within the body will be cleared. unencoded text this document

Sets the charset used in this document.

Sets the charset used in this document. This method is equivalent to OutputSettings.charset(Charset) but in addition it updates the charset / encoding element within the document. This enables meta charset update. If there's no element with charset / encoding information yet it will be created. Obsolete charset / encoding definitions are removed! Elements used: Html: <meta charset="CHARSET"> Xml: <?xml version="1.0" encoding="CHARSET"> Charset

Returns the charset used in this document.

Returns the charset used in this document. This method is equivalent to . Current Charset

Sets whether the element with charset information in this document is updated on changes through Document.charset(Charset) or not.

Sets whether the element with charset information in this document is updated on changes through Document.charset(Charset) or not. If set to false (default) there are no elements modified. If true the element updated on charset changes, false if not

Returns whether the element with charset information in this document is updated on changes through Document.charset(Charset) or not.

Returns true if the element is updated on charset changes, false if not

Ensures a meta charset (html) or xml declaration (xml) with the current encoding used.

Ensures a meta charset (html) or xml declaration (xml) with the current encoding used. This only applies with updateMetaCharset set to true, otherwise this method does nothing. An exsiting element gets updated with the current charset If there's no element yet it will be inserted Obsolete elements are removed Elements used: Html: <meta charset="CHARSET"> Xml: <?xml version="1.0" encoding="CHARSET">

Get the document's current output settings.

the document's current output settings.

Set the document's output settings.

new output settings. this document, for chaining.

A Document's output settings control the form of the text() and html() methods.

Get the document's current HTML escape mode: base, which provides a limited set of named HTML entities and escapes other characters as numbered entities for maximum compatibility; or extended, which uses the complete set of HTML named entities.

Set the document's escape mode, which determines how characters are escaped when the output character set does not support a given character:- using either a named or a numbered escape.

the new escape mode to use the document's output settings, for chaining

Get the document's current output charset, which is used to control which characters are escaped when generating HTML (via the html() methods), and which are kept intact.

Get the document's current output charset, which is used to control which characters are escaped when generating HTML (via the html() methods), and which are kept intact. Where possible (when parsing from a URL or File), the document's output charset is automatically set to the input charset. Otherwise, it defaults to UTF-8. the document's current charset.

Update the document's output charset.

the new charset to use. the document's output settings, for chaining

Update the document's output charset.

the new charset (by name) to use. the document's output settings, for chaining

Get the document's current output syntax.

current syntax

Set the document's output syntax.

Set the document's output syntax. Either html , with empty tags and boolean attributes (etc), or xml , with self-closing tags. serialization syntax the document's output settings, for chaining

Get if pretty printing is enabled.

Get if pretty printing is enabled. Default is true. If disabled, the HTML output methods will not re-format the output, and the output will generally look like the input. if pretty printing is enabled.

Enable or disable pretty printing.

new pretty print setting this, for chaining

Get if outline mode is enabled.

Get if outline mode is enabled. Default is false. If enabled, the HTML output methods will consider all tags as block. if outline mode is enabled.

Enable or disable HTML outline mode.

new outline setting this, for chaining

Get the current tag indent amount, used when pretty printing.

the current indent amount

Set the indent amount for pretty printing

number of spaces to use for indenting each level. Must be >= 0. this, for chaining

The output serialization syntax.

A <!DOCTYPE> node.

Create a new doctype element.

the doctype's name the doctype's public ID the doctype's system ID the doctype's base URI

A HTML element consists of a tag name, attributes, and child nodes (including text nodes and other elements).

A HTML element consists of a tag name, attributes, and child nodes (including text nodes and other elements). From an Element, you can extract data, traverse the node graph, and manipulate the HTML. Jonathan Hedley, jonathan@hedley.net

Create a new, standalone Element.

Create a new, standalone Element. (Standalone in that is has no parent.) tag of this element the base URI initial attributes

Create a new Element from a tag and a base URI.

element tag the base URI of this element. It is acceptable for the base URI to be an empty string, but not null.

Get the name of the tag for this element.

Get the name of the tag for this element. E.g. div the tag name

Change the tag of this element.

Change the tag of this element. For example, convert a <span> to a <div> with el.tagName("div");. new tag name for this element this element, for chaining

Get the Tag for this element.

the tag object

Test if this element is a block-level element.

Test if this element is a block-level element. (E.g. <div> == true or an inline element <p> == false ). true if block, false if not (and thus inline)

Get the id attribute of this element.

The id attribute, if present, or an empty string if not.

Set an attribute value on this element.

Set an attribute value on this element. If this element already has an attribute with the key, its value is updated; otherwise, a new attribute is added. this element

Set a boolean attribute value on this element.

Set a boolean attribute value on this element. Setting to true sets the attribute value to "" and marks the attribute as boolean so no value is written out. Setting to false removes the attribute with the same key if it exists. the attribute key the attribute value this element

Get this element's HTML5 custom data attributes.

Get this element's HTML5 custom data attributes. Each attribute in the element that has a key starting with "data-" is included the dataset. E.g., the element <div data-package="jsoup" data-language="Java" class="group">... has the dataset package=jsoup, language=java. This map is a filtered view of the element's attribute map. Changes to one map (add, remove, update) are reflected in the other map. You can find elements that have data attributes using the [^data-] attribute key prefix selector. a map of key=value custom data attributes.

Get this element's parent and ancestors, up to the document root.

this element's stack of parents, closest first.

Get a child element of this element, by its 0-based index number.

Get a child element of this element, by its 0-based index number. Note that an element can have both mixed Nodes and Elements as children. This method inspects a filtered list of children that are elements, and the index is based on that filtered list. the index number of the element to retrieve the child element, if it exists, otherwise throws an IndexOutOfBoundsException

Get this element's child elements.

Get this element's child elements. This is effectively a filter on to get Element nodes. child elements. If this element has no children, returns an empty list.

Get this element's child text nodes.

Get this element's child text nodes. The list is unmodifiable but the text nodes may be manipulated. This is effectively a filter on to get Text nodes. child text nodes. If this element has no text nodes, returns an empty list. For example, with the input HTML: <p>One <span>Two</span> Three <br> Four</p> with the p element selected: p.text() = "One Two Three Four" p.ownText() = "One Three Four" p.children() = Elements[<span>, <br>] p.childNodes() = List<Node>["One ", <span>, " Three ", <br>, " Four"] p.textNodes() = List<TextNode>["One ", " Three ", " Four"]

Get this element's child data nodes.

Get this element's child data nodes. The list is unmodifiable but the data nodes may be manipulated. This is effectively a filter on to get Data nodes. child data nodes. If this element has no data nodes, returns an empty list.

Find elements that match the CSS query, with this element as the starting context.

Find elements that match the CSS query, with this element as the starting context. Matched elements may include this element, or any of its children. This method is generally more powerful to use than the DOM-type getElementBy* methods, because multiple filters can be combined, e.g.: el.select("a[href]") - finds links ( a tags with href attributes) el.select("a[href*=example.com]") - finds links pointing to example.com (loosely) See the query syntax documentation in . a CSS-like query elements that match the query (empty if none match)

Add a node child node to this element.

node to add. this element, so that you can add more child nodes or elements.

Add a node to the start of this element's children.

node to add. this element, so that you can add more child nodes or elements.

Inserts the given child node into this element at the specified index.

Inserts the given child node into this element at the specified index. Current node will be shifted to the right. The inserted nodes will be moved from their current parent. To prevent moving, copy the node first. 0-based index to insert children at. Specify 0 to insert at the start, -1 at the end child node to insert this element, for chaining.

Inserts the given child nodes into this element at the specified index.

Inserts the given child nodes into this element at the specified index. Current nodes will be shifted to the right. The inserted nodes will be moved from their current parent. To prevent moving, copy the nodes first. 0-based index to insert children at. Specify 0 to insert at the start, -1 at the end child nodes to insert this element, for chaining.

Create a new element by tag name, and add it as the last child.

the name of the tag (e.g. div ). the new element, to allow you to add content to it, e.g.: parent.appendElement("h1").attr("id", "header").text("Welcome");

Create a new element by tag name, and add it as the first child.

the name of the tag (e.g. div ). the new element, to allow you to add content to it, e.g.: parent.prependElement("h1").attr("id", "header").text("Welcome");

Create and append a new TextNode to this element.

the unencoded text to add this element

Create and prepend a new TextNode to this element.

the unencoded text to add this element

Add inner HTML to this element.

Add inner HTML to this element. The supplied HTML will be parsed, and each node appended to the end of the children. HTML to add inside this element, after the existing HTML this element

Add inner HTML into this element.

Add inner HTML into this element. The supplied HTML will be parsed, and each node prepended to the start of the element's children. HTML to add inside this element, before the existing HTML this element

Insert the specified HTML into the DOM before this element (as a preceding sibling).

HTML to add before this element this element, for chaining

Insert the specified node into the DOM before this node (as a preceding sibling).

to add before this element this Element, for chaining

Insert the specified HTML into the DOM after this element (as a following sibling).

HTML to add after this element this element, for chaining

Insert the specified node into the DOM after this node (as a following sibling).

to add after this element this element, for chaining

Remove all of the element's child nodes.

Remove all of the element's child nodes. Any attributes are left as-is. this element

Wrap the supplied HTML around this element.

HTML to wrap around this element, e.g. <div class="head"></div> . Can be arbitrarily deep. this element, for chaining.

Get a CSS selector that will uniquely select this element.

Get a CSS selector that will uniquely select this element. If the element has an ID, returns #id; otherwise returns the parent (if any) CSS selector, followed by '>' , followed by a unique selector for the element (tag.class.class:nth-child(n)). the CSS Path that can be used to retrieve the element in a selector.

Get sibling elements.

Get sibling elements. If the element has no sibling elements, returns an empty list. An element is not a sibling of itself, so will not be included in the returned list. sibling elements

Gets the next sibling element of this element.

Gets the next sibling element of this element. E.g., if a div contains two p s, the nextElementSibling of the first p is the second p. This is similar to , but specifically finds only Elements the next element, or null if there is no next element

Gets the previous element sibling of this element.

the previous element, or null if there is no previous element

Gets the first element sibling of this element.

the first sibling that is an element (aka the parent's first element child)

Get the list index of this element in its element sibling list.

Get the list index of this element in its element sibling list. I.e. if this is the first element sibling, returns 0. position in element sibling list

Gets the last element sibling of this element

the last sibling that is an element (aka the parent's last element child)

Finds elements, including and recursively under this element, with the specified tag name.

The tag name to search for (case insensitively). a matching unmodifiable list of elements. Will be empty if this element and none of its children match.

Find an element by ID, including or under this element.

Find an element by ID, including or under this element. Note that this finds the first matching ID, starting with this element. If you search down from a different starting point, it is possible to find a different element by ID. For unique element by ID within a Document, use The ID to search for. The first matching element by ID, starting with this element, or null if none found.

Find elements that have this class, including or under this element.

Find elements that have this class, including or under this element. Case insensitive. Elements can have multiple classes (e.g. <div class="header round first"> . This method checks each class, so you can find the above with el.getElementsByClass("header");. the name of the class to search for. elements with the supplied class name, empty if none

Find elements that have a named attribute set.

Find elements that have a named attribute set. Case insensitive. name of the attribute, e.g. href elements that have this attribute, empty if none

Find elements that have an attribute name starting with the supplied prefix.

Find elements that have an attribute name starting with the supplied prefix. Use data- to find elements that have HTML5 datasets. name prefix of the attribute e.g. data- elements that have attribute names that start with with the prefix, empty if none.

Find elements that have an attribute with the specific value.

Find elements that have an attribute with the specific value. Case insensitive. name of the attribute value of the attribute elements that have this attribute with this value, empty if none

Find elements that either do not have this attribute, or have it with a different value.

Find elements that either do not have this attribute, or have it with a different value. Case insensitive. name of the attribute value of the attribute elements that do not have a matching attribute

Find elements that have attributes that start with the value prefix.

Find elements that have attributes that start with the value prefix. Case insensitive. name of the attribute start of attribute value elements that have attributes that start with the value prefix

Find elements that have attributes that end with the value suffix.

Find elements that have attributes that end with the value suffix. Case insensitive. name of the attribute end of the attribute value elements that have attributes that end with the value suffix

Find elements that have attributes whose value contains the match string.

Find elements that have attributes whose value contains the match string. Case insensitive. name of the attribute substring of value to search for elements that have attributes containing this text

Find elements that have attributes whose values match the supplied regular expression.

name of the attribute compiled regular expression to match against attribute values elements that have attributes matching this regular expression

Find elements that have attributes whose values match the supplied regular expression.

name of the attribute regular expression to match against attribute values. You can use embedded flags (such as (?i) and (?m) to control regex options. elements that have attributes matching this regular expression

Find elements whose sibling index is less than the supplied index.

0-based index elements less than index

Find elements whose sibling index is greater than the supplied index.

0-based index elements greater than index

Find elements whose sibling index is equal to the supplied index.

0-based index elements equal to index

Find elements that contain the specified string.

Find elements that contain the specified string. The search is case insensitive. The text may appear directly in the element, or in any of its descendants. to look for in the element's text elements that contain the string, case insensitive.

Find elements that directly contain the specified string.

Find elements that directly contain the specified string. The search is case insensitive. The text must appear directly in the element, not in any of its descendants. to look for in the element's own text elements that contain the string, case insensitive.

Find elements whose text matches the supplied regular expression.

regular expression to match text against elements matching the supplied regular expression.

Find elements whose text matches the supplied regular expression.

regular expression to match text against. You can use embedded flags (such as (?i) and (?m) to control regex options. elements matching the supplied regular expression.

Find elements whose own text matches the supplied regular expression.

regular expression to match text against elements matching the supplied regular expression.

Find elements whose text matches the supplied regular expression.

regular expression to match text against. You can use embedded flags (such as (?i) and (?m) to control regex options. elements matching the supplied regular expression.

Find all elements under this element (including self, and children of children).

all elements

Gets the combined text of this element and all its children.

Gets the combined text of this element and all its children. Whitespace is normalized and trimmed. For example, given HTML <p>Hello <b>there</b> now! </p> , p.text() returns "Hello there now!" unencoded text, or empty string if none.

Gets the text owned by this element only; does not get the combined text of all children.

Gets the text owned by this element only; does not get the combined text of all children. For example, given HTML <p>Hello <b>there</b> now!</p> , p.ownText() returns "Hello now!" , whereas p.text() returns "Hello there now!". Note that the text within the b element is not returned, as it is not a direct child of the p element. unencoded text, or empty string if none.

Set the text of this element.

Set the text of this element. Any existing contents (text or elements) will be cleared unencoded text this element

Test if this element has any text content (that is not just whitespace).

true if element has non-blank text content.

Get the combined data of this element.

Get the combined data of this element. Data is e.g. the inside of a script tag. the data, or empty string if none

Gets the literal value of this element's "class" attribute, which may include multiple class names, space separated.

Gets the literal value of this element's "class" attribute, which may include multiple class names, space separated. (E.g. on <div class="header gray"> returns, "header gray") The literal class attribute, or empty string if no class attribute set.

Get all of the element's class names.

Get all of the element's class names. E.g. on element <div class="header gray"> , returns a set of two elements "header", "gray" . Note that modifications to this set are not pushed to the backing class attribute; use the method to persist them. set of classnames, empty if no class attribute

Set the element's class attribute to the supplied class names.

set of classes this element, for chaining

Tests if this element has a class.

Tests if this element has a class. Case insensitive. name of class to check for true if it does, false if not

Add a class name to this element's class attribute.

class name to add this element

Remove a class name from this element's class attribute.

class name to remove this element

Toggle a class name on this element's class attribute: if present, remove it; otherwise add it.

class name to toggle this element

Get the value of a form element (input, textarea, etc).

the value of the form element, or empty string if not set.

Set the value of a form element (input, textarea, etc).

value to set this element (for chaining)

Retrieves the element's inner HTML.

Retrieves the element's inner HTML. E.g. on a <div> with one empty <p> , would return <p></p> . (Whereas would return <div><p></p></div> .) String of HTML.

Set this element's inner HTML.

Set this element's inner HTML. Clears the existing HTML first. HTML to parse and set into this element this element

HTML entities, and escape routines.

HTML entities, and escape routines. Source: W3C HTML named character references.

Restricted entities suitable for XHTML output: lt, gt, amp, and quot only.

Default HTML output entities.

Complete HTML entities.

Check if the input is a known named entity

the possible entity name (e.g. "lt" or "amp") true if a known named entity

Check if the input is a known named entity in the base entity set.

the possible entity name (e.g. "lt" or "amp") true if a known named entity in the base set

Get the Character value of the named entity

named entity (e.g. "lt" or "amp") the Character value of the named entity (e.g. ' < ' or ' & ')

Unescape the input string.

to un-HTML-escape if "strict" (that is, requires trailing ';' char, otherwise that's optional) unescaped string

A HTML Form Element provides ready access to the form fields/controls that are associated with it.

Create a new, standalone form element.

tag of this element the base URI initial attributes

Get the list of form control elements associated with this form.

form controls associated with this element.

Add a form control element to this form.

form control to add this form element, for chaining

Get the data that this form submits.

Get the data that this form submits. The returned list is a copy of the data, and changes to the contents of the list will not be reflected in the DOM. a list of key vals

The base, abstract Node model.

The base, abstract Node model. Elements, Documents, Comments etc are all Node instances. Jonathan Hedley, jonathan@hedley.net

Create a new Node.

base URI attributes (not null, but may be empty)

Default constructor.

Default constructor. Doesn't setup base uri, children, or attributes; use with caution.

Get the node name of this node.

Get the node name of this node. Use for debugging purposes and not logic switching (for that, use instanceof). node name

Get an attribute's value by its key.

Get an attribute's value by its key. To get an absolute URL from an attribute that may be a relative URL, prefix the key with abs, which is a shortcut to the method. E.g.:

String url = a.attr("abs:href");

The attribute key. The attribute, or empty string if not present (to avoid nulls).

Get all of the element's attributes.

attributes (which implements iterable, in same order as presented in original HTML).

Set an attribute (key=value).

Set an attribute (key=value). If the attribute already exists, it is replaced. The attribute key. The attribute value. this (for chaining)

Test if this element has an attribute.

The attribute key to check. true if the attribute exists, false if not.

Remove an attribute from this element.

The attribute to remove. this (for chaining)

Get the base URI of this node.

base URI

Update the base URI of this node and all of its descendants.

base URI to set

Get an absolute URL from a URL attribute that may be relative (i.e. an <a href> or <img src>).

Get an absolute URL from a URL attribute that may be relative (i.e. an <a href> or <img src>). E.g.: String absUrl = linkEl.absUrl("href"); If the attribute value is already absolute (i.e. it starts with a protocol, like http:// or https:// etc), and it successfully parses as a URL, the attribute is returned directly. Otherwise, it is treated as a URL relative to the element's , and made absolute using that. As an alternate, you can use the method with the abs: prefix, e.g.: String absUrl = linkEl.attr("abs:href"); The attribute key An absolute URL if one could be made, or an empty string (not null) if the attribute was missing or could not be made successfully into a URL.

Get a child node by its 0-based index.

index of child node the child node at this index. Throws a IndexOutOfBoundsException if the index is out of bounds.

Get this node's children.

Get this node's children. Presented as an unmodifiable list: new children can not be added, but the child nodes themselves can be manipulated. list of children. If no children, returns an empty list.

Returns a deep copy of this node's children.

Returns a deep copy of this node's children. Changes made to these nodes will not be reflected in the original nodes a deep copy of this node's children

Get the number of child nodes that this node holds.

the number of child nodes that this node holds.

Gets this node's parent node.

parent node; or null if no parent.

Gets this node's parent node.

Gets this node's parent node. Node overridable by extending classes, so useful if you really just need the Node type. parent node; or null if no parent.

Gets the Document associated with this Node.

the Document associated with this Node, or null if there is no such Document.

Remove (delete) this node from the DOM tree.

Remove (delete) this node from the DOM tree. If this node has children, they are also removed.

Insert the specified HTML into the DOM before this node (i.e. as a preceding sibling).

HTML to add before this node this node, for chaining

Insert the specified node into the DOM before this node (i.e. as a preceding sibling).

to add before this node this node, for chaining

Insert the specified HTML into the DOM after this node (i.e. as a following sibling).

HTML to add after this node this node, for chaining

Insert the specified node into the DOM after this node (i.e. as a following sibling).

to add after this node this node, for chaining

Wrap the supplied HTML around this node.

HTML to wrap around this element, e.g. <div class="head"></div> . Can be arbitrarily deep. this node, for chaining.

Removes this node from the DOM, and moves its children up into the node's parent.

Removes this node from the DOM, and moves its children up into the node's parent. This has the effect of dropping the node but keeping its children. For example, with the input html: <div>One <span>Two <b>Three</b></span></div> Calling element.unwrap() on the span element will result in the html: <div>One Two <b>Three</b></div> and the "Two " being returned. the first child of this node, after the node has been unwrapped. Null if the node had no children.

Replace this node in the DOM with the supplied node.

the node that will will replace the existing node.

Retrieves this node's sibling nodes.

Retrieves this node's sibling nodes. Similar to node.parent.childNodes() , but does not include this node (a node is not a sibling of itself). node siblings. If the node has no parent, returns an empty list.

Get this node's next sibling.

next sibling, or null if this is the last sibling

Get this node's previous sibling.

the previous sibling, or null if this is the first sibling

Get the list index of this node in its node sibling list.

Get the list index of this node in its node sibling list. I.e. if this is the first node sibling, returns 0. position in node sibling list

Perform a depth-first traversal through this node and its descendants.

the visitor callbacks to perform on each node this node, for chaining

Get the outer HTML of this node.

HTML

Get the outer HTML of this node.

accumulator to place HTML into

Write this node and its children to the given .

the to write to. the supplied , for chaining.

Check if this node is the same instance of another (object identity test).

other object to compare to true if the content of this node is the same as the other to compare nodes by their value

Check if this node is has the same content as another node.

Check if this node is has the same content as another node. A node is considered the same if its name, attributes and content match the other node; particularly its position in the tree does not influence its similarity. other object to compare to true if the content of this node is the same as the other

Create a stand-alone, deep copy of this node, and all of its children.

Create a stand-alone, deep copy of this node, and all of its children. The cloned node will have no siblings or parent node. As a stand-alone object, any changes made to the clone or any of its children will not impact the original node. The cloned node may be adopted into another Document or node structure using . stand-alone cloned node

A text node.

Jonathan Hedley, jonathan@hedley.net

Create a new TextNode representing the supplied (unencoded) text).

raw text base uri

Get the text content of this text node.

Unencoded, normalised text.

Set the text content of this text node.

unencoded text this, for chaining

Get the (unencoded) text of this text node, including any newlines and spaces present in the original.

text

Test if this text node is blank -- that is, empty or only whitespace (including newlines).

true if this document is empty or only whitespace, false if it contains any text content.

Split this text node into two nodes at the specified string offset.

Split this text node into two nodes at the specified string offset. After splitting, this node will contain the original text up to the offset, and will have a new text node sibling containing the text after the offset. string offset point to split node at. the newly created text node containing the text after the offset.

Create a new TextNode from HTML encoded (aka escaped) data.

Text containing encoded HTML (e.g. <) Base uri TextNode containing unencoded data (e.g. <)

An XML Declaration.

Jonathan Hedley, jonathan@hedley.net

Create a new XML declaration

of declaration base uri is processing instruction

Get the name of this declaration.

name of this declaration.

Get the unencoded XML declaration.

XML declaration

CharacterReader consumes tokens off a string.

CharacterReader consumes tokens off a string. To replace the old TokenQueue.

Returns the number of characters between the current position and the next instance of the input char

scan target offset between current position and next instance of target. -1 if not found.

Returns the number of characters between the current position and the next instance of the input sequence

scan target offset between current position and next instance of target. -1 if not found.

Caches short strings, as a flywheel pattern, to reduce GC load.

Caches short strings, as a flywheel pattern, to reduce GC load. Just for this doc, to prevent leaks.

Simplistic, and on hash collisions just falls back to creating a new string, vs a full HashMap with Entry list. That saves both having to create objects as hash keys, and running through the entry list, at the expense of some more duplicates.

Check if the value of the provided range equals the string.

HTML Tree Builder; creates a DOM from Tokens.

11.2.5.2 Closing elements that have implied end tags

11.2.5.2 Closing elements that have implied end tags When the steps below require the UA to generate implied end tags, then, while the current node is a dd element, a dt element, an li element, an option element, an optgroup element, a p element, an rp element, or an rt element, the UA must pop the current node off the stack of open elements. If a step requires the UA to generate implied end tags but lists an element to exclude from the process, then the UA must perform the above steps as if that element was not in the above list.

The Tree Builder's current state.

The Tree Builder's current state. Each state embodies the processing for the state, and transitions to other states.

A Parse Error records an error in the input HTML that occurs in either the tokenisation or the tree building phase.

Retrieve the error message.

the error message.

Retrieves the offset of the error.

error offset within input

A container for ParseErrors.

Jonathan Hedley

Parses HTML into a .

Parses HTML into a . Generally best to use one of the more convenient parse methods in .

Create a new Parser, using the specified TreeBuilder

TreeBuilder to use to parse input into Documents.

Get the TreeBuilder currently in use.

current TreeBuilder.

Update the TreeBuilder used when parsing content.

current TreeBuilder this, for chaining

Check if parse error tracking is enabled.

current track error state.

Enable or disable parse error tracking for the next parse.

the maximum number of errors to track. Set to 0 to disable. this, for chaining

Retrieve the parse errors, if any, from the last parse.

list of parse errors, up to the size of the maximum errors tracked.

Parse HTML into a Document.

HTML to parse base URI of document (i.e. original fetch location), for resolving relative URLs. parsed Document

Parse XML into a Document.

XML to parse base URI of document (i.e. original fetch location), for resolving relative URLs. parsed Document

Parse a fragment of HTML into a list of nodes.

Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context. the fragment of HTML to parse (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation). base URI of document (i.e. original fetch location), for resolving relative URLs. list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.

Parse a fragment of XML into a list of nodes.

the fragment of XML to parse base URI of document (i.e. original fetch location), for resolving relative URLs. list of nodes parsed from the input XML.

Parse a fragment of HTML into the body of a Document.

fragment of HTML base URI of document (i.e. original fetch location), for resolving relative URLs. Document, with empty head, and HTML parsed into body

Utility method to unescape HTML entities from a string

HTML escaped string if the string is to be escaped in strict mode (as attributes are) an unescaped string HTML to parse baseUri base URI of document (i.e. original fetch location), for resolving relative URLs. parsed Document

Create a new HTML parser.

Create a new HTML parser. This parser treats input as HTML5, and enforces the creation of a normalised document, based on a knowledge of the semantics of the incoming tags. a new HTML parser.

Create a new XML parser.

Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input. a new simple XML parser.

HTML Tag capabilities.

Jonathan Hedley, jonathan@hedley.net

Get this tag's name.

the tag's name

Get a Tag by name.

Get a Tag by name. If not previously defined (unknown), returns a new generic tag, that can do anything. Pre-defined tags (P, DIV etc) will be ==, but unknown tags are not registered and will only .equals(). Name of tag, e.g. "p". Case insensitive. The tag, either defined or new generic.

Gets if this is a block tag.

if block tag

Gets if this tag should be formatted as a block (or as inline)

if should be formatted as block or inline

Gets if this tag can contain block tags.

if tag can contain block tags

Gets if this tag is an inline tag.

if this tag is an inline tag.

Gets if this tag is a data only tag.

if this tag is a data only tag

Get if this is an empty tag

if this is an empty tag

Get if this tag is self closing.

if this tag should be output as self closing.

Get if this is a pre-defined tag, or was auto created on parsing.

if a known tag

Check if this tagname is a known tag.

name of tag if known HTML tag

Get if this tag should preserve whitespace within child text nodes.

if preserve whitepace

Get if this tag represents a control associated with a form.

Get if this tag represents a control associated with a form. E.g. input, textarea, output if associated with a form

Get if this tag represents an element that should be submitted with a form.

Get if this tag represents an element that should be submitted with a form. E.g. input, option if submittable with a form

Parse tokens for the Tokeniser.

Reset the data represent by this token, for reuse.

Reset the data represent by this token, for reuse. Prevents the need to create transfer objects for every piece of data, which immediately get GCed.

Readers the input stream into tokens.

Utility method to consume reader and unescape entities found within.

unescaped string from reader

States and transition activations for the Tokeniser.

handles data in title, textarea etc

Handles RawtextEndTagName, ScriptDataEndTagName, and ScriptDataEscapedEndTagName.

Handles RawtextEndTagName, ScriptDataEndTagName, and ScriptDataEscapedEndTagName. Same body impl, just different else exit transitions.

A character queue with parsing helpers.

Jonathan Hedley

Create a new TokenQueue.

string of data to back queue.

Is the queue empty?

true if no data left in queue.

Retrieves but does not remove the first character from the queue.

First character, or 0 if empty.

Add a character to the start of the queue (will be the next character retrieved).

character to add

Add a string to the start of the queue.

string to add.

Tests if the next characters on the queue match the sequence.

Tests if the next characters on the queue match the sequence. Case insensitive. String to check queue for. true if the next characters match.

Case sensitive match test.

string to case sensitively check for true if matched, false if not

Tests if the next characters match any of the sequences.

Tests if the next characters match any of the sequences. Case insensitive. list of strings to case insensitively check for true of any matched, false if none did

Tests if the queue matches the sequence (as with match), and if they do, removes the matched string from the queue.

String to search for, and if found, remove from queue. true if found and removed, false if not found.

Tests if queue starts with a whitespace character.

if starts with whitespace

Test if the queue matches a word character (letter or digit).

if matches a word character

Drops the next character off the queue.

Consume one character off queue.

first character on queue.

Consumes the supplied sequence of the queue.

Consumes the supplied sequence of the queue. If the queue does not start with the supplied sequence, will throw an illegal state exception -- but you should be running match() against that condition. Case insensitive. sequence to remove from head of queue.

Pulls a string off the queue, up to but exclusive of the match sequence, or to the queue running out.

String to end on (and not include in return, but leave on queue). Case sensitive. The matched data consumed from queue.

Consumes to the first sequence provided, or to the end of the queue.

Consumes to the first sequence provided, or to the end of the queue. Leaves the terminator on the queue. any number of terminators to consume to. Case insensitive. consumed string

Pulls a string off the queue (like consumeTo), and then pulls off the matched string (but does not return it).

Pulls a string off the queue (like consumeTo), and then pulls off the matched string (but does not return it). If the queue runs out of characters before finding the seq, will return as much as it can (and queue will go isEmpty() == true). String to match up to, and not include in return, and to pull off queue. Case sensitive. Data matched from queue.

Pulls a balanced string off the queue.

Pulls a balanced string off the queue. E.g. if queue is "(one (two) three) four", (,) will return "one (two) three", and leave " four" on the queue. Unbalanced openers and closers can quoted (with ' or ") or escaped (with \). Those escapes will be left in the returned string, which is suitable for regexes (where we need to preserve the escape), but unsuitable for contains text strings; use unescape for that. opener closer data matched from the queue

Unescaped a \ escaped string.

backslash escaped string unescaped string

Pulls the next run of whitespace characters of the queue.

Whether consuming whitespace or not

Retrieves the next run of word type (letter or digit) off the queue.

String of word characters from queue, or empty string if none.

Consume an tag name off the queue (word or :, _, -)

tag name

Consume a CSS element selector (tag name, but | instead of : for namespaces, to not conflict with :pseudo selects).

tag name

Consume a CSS identifier (ID or class) off the queue (letter, digit, -, _) http://www.w3.org/TR/CSS2/syndata.html#value-def-identifier

identifier

Consume an attribute key off the queue (letter, digit, -, _, :")

attribute key

Consume and return whatever is left on the queue.

remained of queue. Jonathan Hedley

Use the XmlTreeBuilder when you want to parse XML without any of the HTML DOM rules being applied to the document.

Use the XmlTreeBuilder when you want to parse XML without any of the HTML DOM rules being applied to the document. Usage example: Document xmlDoc = Jsoup.parse(html, baseUrl, Parser.xmlParser()); Jonathan Hedley

If the stack contains an element with this tag's name, pop up the stack to remove the first occurrence.

If the stack contains an element with this tag's name, pop up the stack to remove the first occurrence. If not found, skips.

The whitelist based HTML cleaner.

The whitelist based HTML cleaner. Use to ensure that end-user provided HTML contains only the elements and attributes that you are expecting; no junk, and no cross-site scripting attacks! The HTML cleaner parses the input as HTML and then runs it through a white-list, so the output HTML can only contain HTML that is allowed by the whitelist. It is assumed that the input HTML is a body fragment; the clean methods only pull from the source's body, and the canned white-lists only allow body contained tags. Rather than interacting directly with a Cleaner object, generally see the clean methods in .

Create a new cleaner, that sanitizes documents using the supplied whitelist.

white-list to clean with

Creates a new, clean document, from the original dirty document, containing only elements allowed by the whitelist.

Creates a new, clean document, from the original dirty document, containing only elements allowed by the whitelist. The original document is not modified. Only elements from the dirt document's body are used. Untrusted base document to clean. cleaned document.

Determines if the input document is valid, against the whitelist.

Determines if the input document is valid, against the whitelist. It is considered valid if all the tags and attributes in the input HTML are allowed by the whitelist. This method can be used as a validator for user input forms. An invalid document will still be cleaned successfully using the document. If using as a validator, it is recommended to still clean the document to ensure enforced attributes are set correctly, and that the output is tidied. document to test true if no tags or attributes need to be removed; false if they do

Iterates the input and copies trusted nodes (tags, attributes, text) into the destination.

Whitelists define what HTML (elements and attributes) to allow through the cleaner.

Whitelists define what HTML (elements and attributes) to allow through the cleaner. Everything else is removed. Start with one of the defaults: If you need to allow more through (please be careful!), tweak a base whitelist with: You can remove any setting from an existing whitelist with: The cleaner and these whitelists assume that you want to clean a body fragment of HTML (to add user supplied HTML into a templated page), and not to clean a full HTML document. If the latter is the case, either wrap the document HTML around the cleaned body HTML, or create a whitelist that allows html and head elements as appropriate. If you are going to extend a whitelist, please be very careful. Make sure you understand what attributes may lead to XSS attack vectors. URL attributes are particularly vulnerable and require careful validation. See http://ha.ckers.org/xss.html for some XSS attack examples. Jonathan Hedley

This whitelist allows only text nodes: all HTML will be stripped.

whitelist

This whitelist allows only simple text formatting: b, em, i, strong, u.

This whitelist allows only simple text formatting: b, em, i, strong, u. All other HTML (tags and attributes) will be removed. whitelist

This whitelist allows a fuller range of text nodes: a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, span, strike, strong, sub, sup, u, ul, and appropriate attributes.

This whitelist allows a fuller range of text nodes: a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, span, strike, strong, sub, sup, u, ul, and appropriate attributes. Links (a elements) can point to http, https, ftp, mailto, and have an enforced rel=nofollow attribute. Does not allow images. whitelist

This whitelist allows the same text tags as , and also allows img tags, with appropriate attributes, with src pointing to http or https.

whitelist

This whitelist allows a full range of text and structural body HTML: a, b, blockquote, br, caption, cite, code, col, colgroup, dd, div, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, span, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul

Create a new, empty whitelist.

Create a new, empty whitelist. Generally it will be better to start with a default prepared whitelist instead.

Add a list of allowed elements to a whitelist.

Add a list of allowed elements to a whitelist. (If a tag is not allowed, it will be removed from the HTML.) tag names to allow this (for chaining)

Remove a list of allowed elements from a whitelist.

Remove a list of allowed elements from a whitelist. (If a tag is not allowed, it will be removed from the HTML.) tag names to disallow this (for chaining)

Add a list of allowed attributes to a tag.

Add a list of allowed attributes to a tag. (If an attribute is not allowed on an element, it will be removed.) E.g.: addAttributes("a", "href", "class") allows href and class attributes on a tags. To make an attribute valid for all tags, use the pseudo tag :all, e.g. addAttributes(":all", "class"). The tag the attributes are for. The tag will be added to the allowed tag list if necessary. List of valid attributes for the tag this (for chaining)

Remove a list of allowed attributes from a tag.

Remove a list of allowed attributes from a tag. (If an attribute is not allowed on an element, it will be removed.) E.g.: removeAttributes("a", "href", "class") disallows href and class attributes on a tags. To make an attribute invalid for all tags, use the pseudo tag :all, e.g. removeAttributes(":all", "class"). The tag the attributes are for. List of invalid attributes for the tag this (for chaining)

Add an enforced attribute to a tag.

Add an enforced attribute to a tag. An enforced attribute will always be added to the element. If the element already has the attribute set, it will be overridden. E.g.: addEnforcedAttribute("a", "rel", "nofollow") will make all a tags output as <a href="..." rel="nofollow"> The tag the enforced attribute is for. The tag will be added to the allowed tag list if necessary. The attribute key The enforced attribute value this (for chaining)

Remove a previously configured enforced attribute from a tag.

The tag the enforced attribute is for. The attribute key this (for chaining)

Configure this Whitelist to preserve relative links in an element's URL attribute, or convert them to absolute links.

Configure this Whitelist to preserve relative links in an element's URL attribute, or convert them to absolute links. By default, this is false: URLs will be made absolute (e.g. start with an allowed protocol, like e.g. http://. Note that when handling relative links, the input document must have an appropriate base URI set when parsing, so that the link's protocol can be confirmed. Regardless of the setting of the preserve relative links option, the link must be resolvable against the base URI to an allowed protocol; otherwise the attribute will be removed. to allow relative links, (default) to deny this Whitelist, for chaining.

Add allowed URL protocols for an element's URL attribute.

Add allowed URL protocols for an element's URL attribute. This restricts the possible values of the attribute to URLs with the defined protocol. E.g.: addProtocols("a", "href", "ftp", "http", "https") To allow a link to an in-page URL anchor (i.e. <a href="#anchor">, add a #:
E.g.: addProtocols("a", "href", "#") Tag the URL protocol is for Attribute key List of valid protocols this, for chaining

Remove allowed URL protocols for an element's URL attribute.

Remove allowed URL protocols for an element's URL attribute. E.g.: removeProtocols("a", "href", "ftp") Tag the URL protocol is for Attribute key List of invalid protocols this, for chaining

Test if the supplied tag is allowed by this whitelist

test tag true if allowed

Test if the supplied attribute is allowed by this whitelist for this tag

tag to consider allowing the attribute in element under test, to confirm protocol attribute under test true if allowed

Collects a list of elements that match the supplied criteria.

Jonathan Hedley

Build a list of elements, by visiting root and every descendant of root, and testing it against the evaluator.

Evaluator to test elements against root of tree to descend list of matches; empty if none

Base combining (and, or) evaluator.

Create a new Or evaluator.

Create a new Or evaluator. The initial evaluators are ANDed together and used as the first clause of the OR. initial OR clause (these are wrapped into an AND evaluator).

Creates a deep copy of these elements.

a deep copy

Get an attribute value from the first matched element that has the attribute.

The attribute key. The attribute value from the first matched element that has the attribute.. If no elements were matched (isEmpty() == true), or if the no elements have the attribute, returns empty string.

Checks if any of the matched elements have this attribute set.

attribute key true if any of the elements have the attribute; false if none do.

Set an attribute on all matched elements.

attribute key attribute value this

Remove an attribute from every matched element.

The attribute to remove. this (for chaining)

Add the class name to every matched element's class attribute.

class name to add this

Remove the class name from every matched element's class attribute, if present.

class name to remove this

Toggle the class name on every matched element's class attribute.

class name to add if missing, or remove if present, from every element. this

Determine if any of the matched elements have this class name set in their class attribute.

class name to check for true if any do, false if none do

Get the form element's value of the first matched element.

The form element's value, or empty if not set.

Set the form element's value in each of the matched elements.

The value to set into each matched element this (for chaining)

Get the combined inner HTML of all matched elements.

string of all element's inner HTML.

Get the combined outer HTML of all matched elements.

string of all element's outer HTML.

Get the combined outer HTML of all matched elements.

Get the combined outer HTML of all matched elements. Alias of . string of all element's outer HTML.

Update the tag name of each matched element.

Update the tag name of each matched element. For example, to change each <i> to a <em> , do doc.select("i").tagName("em"); the new tag name this, for chaining

Set the inner HTML of each matched element.

HTML to parse and set into each matched element. this, for chaining

Add the supplied HTML to the start of each matched element's inner HTML.

HTML to add inside each element, before the existing HTML this, for chaining

Add the supplied HTML to the end of each matched element's inner HTML.

HTML to add inside each element, after the existing HTML this, for chaining

Insert the supplied HTML before each matched element's outer HTML.

HTML to insert before each element this, for chaining

Insert the supplied HTML after each matched element's outer HTML.

HTML to insert after each element this, for chaining

Wrap the supplied HTML around each matched elements.

Wrap the supplied HTML around each matched elements. For example, with HTML <p><b>This</b> is <b>Jsoup</b></p> , doc.select("b").wrap("<i></i>"); becomes <p><i><b>This</b></i> is <i><b>jsoup</b></i></p> HTML to wrap around each element, e.g. <div class="head"></div> . Can be arbitrarily deep. this (for chaining)

Find matching elements within this element list.

A query the filtered list of elements, or an empty list if none match.

Test if any of the matched elements match the supplied query.

A selector true if at least one element in the list matches the query.

Get all of the parents and ancestor elements of the matched elements.

all of the parents and ancestor elements of the matched elements

Get the first matched element.

The first matched element, or null if contents is empty.

Get the last matched element.

The last matched element, or null if contents is empty.

Perform a depth-first traversal on each of the selected elements.

the visitor callbacks to perform on each node this, for chaining

Get the forms from the selected elements, if any.

a list of s pulled from the matched elements. The list will be empty if the elements contain no forms.

Evaluates that an element matches the selector.

Test if the element meets the evaluator's requirements.

Root of the matching subtree tested element Returns true if the requirements are met or false otherwise

Evaluator for tag name

Evaluator for element id

Evaluator for element class

Evaluator for attribute name matching

Evaluator for attribute name prefix matching

Evaluator for attribute name/value matching

Evaluator for attribute name != value matching

Evaluator for attribute name/value matching (value prefix)

Evaluator for attribute name/value matching (value ending)

Evaluator for attribute name/value matching (value containing)

Evaluator for attribute name/value matching (value regex matching)

Abstract evaluator for attribute name/value matching

Evaluator for any / all element matching

Evaluator for matching by sibling index number (e < idx)

Evaluator for matching by sibling index number (e > idx)

Evaluator for matching by sibling index number (e = idx)

Evaluator for matching the last sibling (css :last-child)

css-compatible Evaluator for :eq (css :nth-child)

css pseudo class :nth-last-child)

css pseudo class nth-of-type

Evaluator for matching the first sibling (css :first-child)

css3 pseudo-class :root

:root selector

Abstract evaluator for sibling index matching

ant

Evaluator for matching Element (and its descendants) text

Evaluator for matching Element's own text

Evaluator for matching Element's own text with regex

Evaluator for matching Element (and its descendants) text with regex

Depth-first node traversor.

Depth-first node traversor. Use to iterate through all nodes under and including the specified root node. This implementation does not use recursion, so a deep DOM does not risk blowing the stack.

Create a new traversor.

a class implementing the interface, to be called when visiting each node.

Start a depth-first traverse of the root and all of its descendants.

the root node point to traverse.

Node visitor interface.

Node visitor interface. Provide an implementing class to to iterate through nodes. This interface provides two methods, head and tail . The head method is called when the node is first seen, and the tail method when all of the node's children have been visited. As an example, head can be used to create a start tag for a node, and tail to create the end tag.

Callback for when a node is first visited.

the node being visited. the depth of the node, relative to the root node. E.g., the root node has depth 0, and a child node of that will have depth 1.

Callback for when a node is last visited, after all of its descendants have been visited.

the node being visited. the depth of the node, relative to the root node. E.g., the root node has depth 0, and a child node of that will have depth 1.

Parses a CSS selector into an Evaluator tree.

Create a new QueryParser.

CSS query

Parse a CSS query into an Evaluator.

CSS query Evaluator

Parse the query

Evaluator

CSS-like element selector, that finds elements matching a query.

Selector syntax

A selector is a chain of simple selectors, separated by combinators. Selectors are case insensitive (including against elements, attributes, and attribute values). The universal selector (*) is implicit when no element selector is supplied (i.e. *.header and .header is equivalent).

Pattern	Matches	Example
*	any element	*
tag	elements with the given tag name	div
ns\|E	elements of type E in the namespace ns	fb\|name finds <fb:name> elements
#id	elements with attribute ID of "id"	div#wrap, #logo
.class	elements with a class name of "class"	div.left, .result
[attr]	elements with an attribute named "attr" (with any value)	a[href], [title]
[^attrPrefix]	elements with an attribute name starting with "attrPrefix". Use to find elements with HTML5 datasets	[^data-], div[^data-]
[attr=val]	elements with an attribute named "attr", and value equal to "val"	img[width=500], a[rel=nofollow]
[attr="val"]	elements with an attribute named "attr", and value equal to "val"	span[hello="Cleveland"][goodbye="Columbus"], a[rel="nofollow"]
[attr^=valPrefix]	elements with an attribute named "attr", and value starting with "valPrefix"	a[href^=http:]
[attr$=valSuffix]	elements with an attribute named "attr", and value ending with "valSuffix"	img[src$=.png]
[attr*=valContaining]	elements with an attribute named "attr", and value containing "valContaining"	a[href*=/search/]
[attr~=regex]	elements with an attribute named "attr", and value matching the regular expression	img[src~=(?i)\\.(png\|jpe?g)]
	The above may be combined in any order	div.header[title]
Combinators
E F	an F element descended from an E element	div a, .logo h1
E > F	an F direct child of E	ol > li
E + F	an F element immediately preceded by sibling E	li + li, div.head + div
E ~ F	an F element preceded by sibling E	h1 ~ p
E, F, G	all matching elements E, F, or G	a[href], div, h3
Pseudo selectors
:lt(n)	elements whose sibling index is less than n	td:lt(3) finds the first 3 cells of each row
:gt(n)	elements whose sibling index is greater than n	td:gt(1) finds cells after skipping the first two
:eq(n)	elements whose sibling index is equal to n	td:eq(0) finds the first cell of each row
:has(selector)	elements that contains at least one element matching the selector	div:has(p) finds divs that contain p elements
:not(selector)	elements that do not match the selector. See also	div:not(.logo) finds all divs that do not have the "logo" class.div:not(:has(div)) finds divs that do not contain divs.
:contains(text)	elements that contains the specified text. The search is case insensitive. The text may appear in the found element, or any of its descendants.	p:contains(jsoup) finds p elements containing the text "jsoup".
:matches(regex)	elements whose text matches the specified regular expression. The text may appear in the found element, or any of its descendants.	td:matches(\\d+) finds table cells containing digits. div:matches((?i)login) finds divs containing the text, case insensitively.
:containsOwn(text)	elements that directly contain the specified text. The search is case insensitive. The text must appear in the found element, not any of its descendants.	p:containsOwn(jsoup) finds p elements with own text "jsoup".
:matchesOwn(regex)	elements whose own text matches the specified regular expression. The text must appear in the found element, not any of its descendants.	td:matchesOwn(\\d+) finds table cells directly containing digits. div:matchesOwn((?i)login) finds divs containing the text, case insensitively.
	The above may be combined in any order and with other selectors	.light:contains(name):eq(0)
Structural pseudo selectors
:root	The element that is the root of the document. In HTML, this is the html element	:root
:nth-child(an+b)	elements that have an+b-1 siblings before it in the document tree, for any positive integer or zero value of n, and has a parent element. For values of a and b greater than zero, this effectively divides the element's children into groups of a elements (the last group taking the remainder), and selecting the bth element of each group. For example, this allows the selectors to address every other row in a table, and could be used to alternate the color of paragraph text in a cycle of four. The a and b values must be integers (positive, negative, or zero). The index of the first child of an element is 1. In addition to this, :nth-child() can take odd and even as arguments instead. odd has the same signification as 2n+1, and even has the same signification as 2n.	tr:nth-child(2n+1) finds every odd row of a table. :nth-child(10n-1) the 9th, 19th, 29th, etc, element. li:nth-child(5) the 5h li
:nth-last-child(an+b)	elements that have an+b-1 siblings after it in the document tree. Otherwise like :nth-child()	tr:nth-last-child(-n+2) the last two rows of a table
:nth-of-type(an+b)	pseudo-class notation represents an element that has an+b-1 siblings with the same expanded element name before it in the document tree, for any zero or positive integer value of n, and has a parent element	img:nth-of-type(2n+1)
:nth-last-of-type(an+b)	pseudo-class notation represents an element that has an+b-1 siblings with the same expanded element name after it in the document tree, for any zero or positive integer value of n, and has a parent element	img:nth-last-of-type(2n+1)
:first-child	elements that are the first child of some other element.	div > p:first-child
:last-child	elements that are the last child of some other element.	ol > li:last-child
:first-of-type	elements that are the first sibling of its type in the list of children of its parent element	dl dt:first-of-type
:last-of-type	elements that are the last sibling of its type in the list of children of its parent element	tr > td:last-of-type
:only-child	elements that have a parent element and whose parent element hasve no other element children
:only-of-type	an element that has a parent element and whose parent element has no other element children with the same expanded element name
:empty	elements that have no children at all

Jonathan Hedley, jonathan@hedley.net

Find elements matching selector.

CSS selector root element to descend into matching elements, empty if none

Find elements matching selector.

CSS selector root element to descend into matching elements, empty if none

Find elements matching selector.

CSS selector root elements to descend into matching elements, empty if none

Base structural evaluator.

A SerializationException is raised whenever serialization of a DOM element fails.

A SerializationException is raised whenever serialization of a DOM element fails. This exception usually wraps an that may be thrown due to an inaccessible output stream.

Creates and initializes a new serialization exception with no error message and cause.

Creates and initializes a new serialization exception with the given error message and no cause.

the error message of the new serialization exception (may be null).

Creates and initializes a new serialization exception with the specified cause and an error message of (cause==null ? null : cause.toString()) (which typically contains the class and error message of cause).

the cause of the new serialization exception (may be null).

Creates and initializes a new serialization exception with the given error message and cause.

the error message of the new serialization exception. the cause of the new serialization exception.

Signals that a HTTP response returned a mime type that is not supported.

Class that bundles all the error message templates as constants.

The Constant DEFAULT_VALUE_OF_CSS_PROPERTY_UNKNOWN.

The Constant ERROR_ADDING_CHILD_NODE.

The Constant ERROR_PARSING_COULD_NOT_MAP_NODE

The Constant ERROR_PARSING_CSS_SELECTOR.

The Constant UNKNOWN_ABSOLUTE_METRIC_LENGTH_PARSED.

The Constant QUOTES_PROPERTY_INVALID.

The Constant QUOTE_IS_NOT_CLOSED_IN_CSS_EXPRESSION.

The Constant INVALID_CSS_PROPERTY_DECLARATION.

The Constant RULE_IS_NOT_SUPPORTED.

The Constant UNABLE_TO_RETRIEVE_IMAGE_WITH_GIVEN_BASE_URI.

The Constant UNABLE_TO_RETRIEVE_STREAM_WITH_GIVEN_BASE_URI.

The Constant WAS_NOT_ABLE_TO_DEFINE_BACKGROUND_CSS_SHORTHAND_PROPERTIES.

The Constant ERROR_RESOLVING_PARENT_STYLES.

Instantiates a new log message constant.

Interface for HTML attributes.

Gets the key.

the key

Gets the value.

the value

Interface for a series of HTML attributes.

Gets the value of an attribute, given a key.

the key the attribute

Adds a key and a value of an attributes.

the key the value

Returns the number of attributes.

the number of attributes

Interface that serves as a marker indicating that this particular is something non-standard.

Interface for data nodes.

Gets all the data.

the data

Interface implemented by classes that are a top node, and as such represent a Document.

Interface for the document type node.

Interface for node classes that have a parent and children, and for which styles can be defined; each of these nodes can also have a name and attributes.

Gets the name of the element node.

the string

Gets the attributes.

the attributes

Gets an attribute.

the key of the attribute we want to get the value of the attribute

Gets additional styles, more specifically styles that affect an element based on its position in the HTML DOM, e.g. cell borders that are set due to the parent table "border" attribute, or styles from "col" tags that affect table elements, or blocks horizontal alignment that is the result of parent's "align" attribute.

the additional html styles

Adds additional HTML styles.

the styles

Gets the language.

the language value

Class that uses JSoup to parse HTML.

The logger.

Wraps JSoup nodes into pdfHTML classes.

the JSoup node instance the instance

Class that uses JSoup to parse HTML.

The logger.

Wraps JSoup nodes into pdfHTML classes.

the JSoup node instance the instance

Implementation of the interface; wrapper for the JSoup class.

The JSoup instance.

Creates a new instance.

the attribute

Implementation of the interface; wrapper for the JSoup class.

The JSoup data node instance.

Creates a new instance.

the data node

Implementation of the interface; wrapper for the JSoup class.

The JSoup document instance.

Creates a new instance.

the document

Gets the JSoup document.

the document

Implementation of the interface; wrapper for the JSoup class.

Creates a new instance.

the node

Implementation of the interface; wrapper for the JSoup class.

The JSoup element.

The attributes.

The resolved styles.

The custom default styles.

The language.

Creates a new instance.

the element

Returns the element text.

the text

Implementation of the interface; wrapper for the JSoup class.

The JSoup node instance.

The child nodes.

The parent node.

Creates a new instance.

the node

Implementation of the interface; wrapper for the JSoup class.

The text node.

Creates a new instance.

the text node

Interface for classes that describe a Node with a parent and children.

Gets the child nodes.

a list of instances.

Adds a child node.

a child node that will be added to the current node

Gets the parent node.

the parent node

Interface for classes that can get and set styles.

Sets the styles.

a with style keys and values.

Gets the styles.

the styles

Interface for text nodes.

Returns all the text.

the text

This file is a helper class for internal usage only. Be aware that its API and functionality may be changed in future.

By default "." symbol in regular expressions does not match line terminators. The issue is more complicated by the fact that "." does not match only "\n" in C#, while it does not match several other characters as well in Java. This utility method creates a pattern in which dots match any character, including line terminators

regular expression string pattern in which dot characters match any Unicode char, including line terminators

A basic that allows configuring in the constructor which fonts are loaded by default.

Creates a new instance.

use true if you want to register the standard Type 1 fonts (can't be embedded) use true if you want to register the system fonts (can require quite some resources)

Creates a new instance.

use true if you want to register the standard Type 1 fonts (can't be embedded) use true if you want to register the system fonts (can require quite some resources) default font family

Creates a new instance.

predefined set of fonts, could be null. default font family.

Utilities class to resolve resources.

Identifier string used when loading in base64 images

The instance.

Creates instance.

Creates instance. If is a string that represents an absolute URI with any schema except "file" - resources url values will be resolved exactly as "new URL(baseUrl, uriString)". Otherwise base URI will be handled as path in local file system. If empty string or relative URI string is passed as base URI, then it will be resolved against current working directory of this application instance. base URI against which all relative resource URIs will be resolved.

Retrieve .

either link to file or base64 encoded stream. PdfImageXObject on success, otherwise null.

Retrieve image as either , or .

either link to file or base64 encoded stream. PdfImageXObject on success, otherwise null.

Open an to a style sheet URI.

the URI the

Deprecated: use retrieveBytesFromResource instead Replaced by retrieveBytesFromResource for the sake of method name clarity.

Deprecated: use retrieveBytesFromResource instead Replaced by retrieveBytesFromResource for the sake of method name clarity. Retrieve a resource as a byte array from a source that can either be a link to a file, or a base64 encoded . either link to file or base64 encoded stream. byte[] on success, otherwise null.

Retrieve a resource as a byte array from a source that can either be a link to a file, or a base64 encoded .

either link to file or base64 encoded stream. byte[] on success, otherwise null.

Retrieve the resource found in src as an InputStream

path to the resource InputStream for the resource

Checks if string contains base64 mark.

Checks if string contains base64 mark. It does not guarantee that src is a correct base64 data-string. string to test

Resolves a given URI against the base URI.

the uri the url

Resets the simple image cache.

Check if the type of image located at the passed is supported by the

location of the image resource true if the image type is supported, false otherwise

Create a iText XObject based on the image stored at the passed location

location of the Image file containing the Image loaded in

Checks if source is under data URI scheme.

Checks if source is under data URI scheme. (eg data:[<media type>][;base64],<data>) String to test

Simple implementation of an image cache.

The cache mapping a source path to an Image XObject.

Stores how many times each image is used.

The capacity of the cache.

Creates a new instance.

the capacity

Adds an image to the cache.

the source path the image XObject to be cached

Gets an image from the cache.

the source path the image XObject

Gets the size of the cache.

the cache size

Resets the cache.

Ensures the capacity of the cache by removing the least important images (based on the number of times an image is used).

Set of 256 characters with the bits that don't need encoding set to on.

The difference between the value a character in lower cases and the upper case character value.

The default encoding ("UTF-8").

Encodes a in the default encoding and default uri scheme to an HTML-encoded .

the original string the encoded string

Encodes a in a specific encoding and specific uri scheme to an HTML-encoded .

the original string the encoding the encoded string

Utilities class to resolve URIs.

The base url.

Indicates if the Uri refers to a local resource.

Creates a new instance.

the base URI

Gets the base URI.

the base uri

Resolve a given URI against the base URI.

the given URI the resolved URI

Resolves the base URI to an URL or path.

the base URI

Resolves a base URI as an URL.

the base URI the URL, or null if not successful

Resolves a base URI as a file URL.

the base URI the file URL

Check if baseURI is local

true if baseURI is local, otherwise false

List to store the properties whose value can depend on parent or element font-size

Merge parent CSS declarations.

the styles map the CSS property the parent properties value set of inheritance rules a map of updated styles after merging parent and child style declarations

Check all inheritance rule-sets to see if the passed property is inheritable

property identifier to check a set of inheritance rules True if the property is inheritable by one of the rule-sets, false if it is not marked as inheritable in all rule-sets

Check to see if the passed value is a measurement of the type based on the passed measurement symbol string

string containing value to check measurement symbol (e.g. % for relative, px for pixels) True if the value is numerical and ends with the measurement symbol, false otherwise

Utility class for white-space handling methods that are used both in pdfHTML and the iText-core SVG module

Collapse all consecutive spaces of the passed String into single spaces

String to collapse a String containing the contents of the input, with consecutive spaces collapsed

Checks if a character is white space value that is not em, en or similar special whitespace character.

the character true, if the character is a white space character, but no em, en or similar

Selector syntax

Combinators

Pseudo selectors

Structural pseudo selectors