Types Reference

Types Reference¶

All types defined by the library, grouped by category. Types are shown using Rust as the canonical representation.

Result Types¶

ConversionResult¶

The primary result of HTML conversion and extraction.

Contains the converted text output, optional structured document tree, metadata, extracted tables, images, and processing warnings.

Field	Type	Default	Description
`content`	`Option<String>`	`Default::default()`	Converted text output (markdown, djot, or plain text). `None` when `output_format` is set to `OutputFormat.None`, indicating extraction-only mode.
`document`	`Option<DocumentStructure>`	`Default::default()`	Structured document tree with semantic elements. Populated when `ConversionOptions.include_document_structure` is `true`. `None` otherwise (the default), which avoids the overhead of building the tree. When present, the tree mirrors the converted document: headings open `Group` sections, paragraphs and list items carry inline `TextAnnotation`s, and tables reference the same `TableGrid` data exposed in `Self.tables`. Note: this field is independent of the `metadata` feature flag. Document structure collection is always available at runtime; it is gated only by the runtime option, not by a compile-time feature.
`metadata`	`HtmlMetadata`	—	Extracted HTML metadata (title, OG, links, images, structured data).
`tables`	`Vec<TableData>`	`vec![]`	Extracted tables with structured cell data and markdown representation.
`images`	`Vec<String>`	`vec![]`	Extracted inline images (data URIs and SVGs). Populated when `extract_images` is `true` in options.
`warnings`	`Vec<ProcessingWarning>`	`vec![]`	Non-fatal processing warnings.

Configuration Types¶

See Configuration Reference for detailed defaults and language-specific representations.

ConversionOptions¶

Main conversion options for HTML to Markdown conversion.

Use ConversionOptions.builder() to construct, or the default constructor for defaults.

Field	Type	Default	Description
`heading_style`	`HeadingStyle`	`HeadingStyle::Atx`	Heading style to use in Markdown output (ATX `#` or Setext underline).
`list_indent_type`	`ListIndentType`	`ListIndentType::Spaces`	How to indent nested list items (spaces or tab).
`list_indent_width`	`usize`	`2`	Number of spaces (or tabs) to use for each level of list indentation.
`bullets`	`String`	`"-*+"`	Bullet character(s) to use for unordered list items (e.g. `"-"`, `"*"`).
`strong_em_symbol`	`String`	`"*"`	Character used for bold/italic emphasis markers (`*` or `_`).
`escape_asterisks`	`bool`	`false`	Escape `*` characters in plain text to avoid unintended bold/italic.
`escape_underscores`	`bool`	`false`	Escape `_` characters in plain text to avoid unintended bold/italic.
`escape_misc`	`bool`	`false`	Escape miscellaneous Markdown metacharacters (`[]()#` etc.) in plain text.
`escape_ascii`	`bool`	`false`	Escape ASCII characters that have special meaning in certain Markdown dialects.
`code_language`	`String`	`""`	Default language annotation for fenced code blocks that have no language hint.
`autolinks`	`bool`	`true`	Automatically convert bare URLs into Markdown autolinks.
`default_title`	`bool`	`false`	Emit a default title when no `<title>` tag is present.
`br_in_tables`	`bool`	`false`	Render `<br>` elements inside table cells as literal line breaks.
`compact_tables`	`bool`	`false`	Emit tables without column padding (compact GFM format). When `true`, column widths are not computed and cells are emitted with no trailing spaces. Separator rows use exactly `---` per column. Produces token-efficient output suitable for RAG / LLM contexts. Default `false` (aligned padding preserved).
`highlight_style`	`HighlightStyle`	`HighlightStyle::DoubleEqual`	Style used for `<mark>` / highlighted text (e.g. `==text==`).
`extract_metadata`	`bool`	`true`	Populate `result.metadata` with `<head>` / `<meta>` extraction (title, description, Open Graph, Twitter Card, JSON-LD, …). Default `true`. Disabling skips the metadata pass only — table extraction into `result.tables` runs unconditionally.
`whitespace_mode`	`WhitespaceMode`	`WhitespaceMode::Normalized`	Controls how whitespace sequences are normalised in the converted output. - `WhitespaceMode.Normalized` (default) — collapses consecutive whitespace characters (spaces, tabs, newlines) to a single space, matching browser rendering behaviour. - `WhitespaceMode.Strict` — preserves all whitespace exactly as it appears in the source HTML, including runs of spaces and embedded newlines. Choose `Strict` only when the source HTML uses deliberate whitespace (e.g. pre-formatted content outside `<pre>` tags). For most documents `Normalized` produces cleaner output.
`strip_newlines`	`bool`	`false`	Strip all newlines from the output, producing a single-line result.
`wrap`	`bool`	`false`	Wrap long lines at `wrap_width` characters.
`wrap_width`	`usize`	`80`	Maximum output line width in characters when `wrap` is `true` (default `80`). Lines are broken at word boundaries so that no line exceeds this length. A value of `0` is treated as "no limit" — equivalent to leaving `wrap` disabled. Has no effect when `wrap` is `false`.
`convert_as_inline`	`bool`	`false`	Treat the entire document as inline content (no block-level wrappers).
`sub_symbol`	`String`	`""`	Markdown notation for subscript text (e.g. `"~"`).
`sup_symbol`	`String`	`""`	Markdown notation for superscript text (e.g. `"^"`).
`newline_style`	`NewlineStyle`	`NewlineStyle::Spaces`	How to encode hard line breaks (`<br>`) in Markdown.
`code_block_style`	`CodeBlockStyle`	`CodeBlockStyle::Backticks`	Style used for fenced code blocks (backticks or tilde).
`keep_inline_images_in`	`Vec<String>`	`vec![]`	HTML tag names whose `<img>` children are kept inline instead of block.
`preprocessing`	`PreprocessingOptions`	—	Options for the HTML pre-processing pass applied before conversion begins. Pre-processing runs before the HTML is handed to the converter and can perform operations such as unwrapping redundant wrapper elements, removing tracking pixels, and normalising vendor-specific markup. See `PreprocessingOptions` for the full set of knobs. Defaults to `PreprocessingOptions.default()`, which enables the standard cleaning passes. Set individual fields on `PreprocessingOptions` (or construct via `ConversionOptions.builder`) to opt in or out of specific passes.
`encoding`	`String`	`"utf-8"`	Expected character encoding of the input HTML (default `"utf-8"`).
`debug`	`bool`	`false`	Emit debug information during conversion.
`strip_tags`	`Vec<String>`	`vec![]`	HTML tag names whose content is stripped from the output entirely.
`preserve_tags`	`Vec<String>`	`vec![]`	HTML tag names that are preserved verbatim in the output.
`skip_images`	`bool`	`false`	Skip conversion of `<img>` elements (omit images from output).
`link_style`	`LinkStyle`	`LinkStyle::Inline`	Link rendering style (inline or reference).
`output_format`	`OutputFormat`	`OutputFormat::Markdown`	Target output format (Markdown, plain text, etc.).
`include_document_structure`	`bool`	`false`	Include structured document tree in result.
`extract_images`	`bool`	`false`	Extract inline images from data URIs and SVGs.
`max_image_size`	`u64`	`5242880`	Maximum decoded image size in bytes (default 5MB).
`capture_svg`	`bool`	`false`	Capture SVG elements as images.
`infer_dimensions`	`bool`	`true`	Infer image dimensions from data.
`max_depth`	`Option<usize>`	`None`	Maximum DOM traversal depth. `None` means unlimited. When set, subtrees beyond this depth are silently truncated.
`exclude_selectors`	`Vec<String>`	`vec![]`	CSS selectors for elements to exclude entirely (element + all content). Unlike `strip_tags` (which removes the tag wrapper but keeps children), excluded elements and all their descendants are dropped from the output. Supports any CSS selector that `tl` supports: tag names, `.class`, `#id`, `[attribute]`, etc. Invalid selectors are silently skipped at conversion time. Example: `vec![".cookie-banner".into(), "#ad-container".into(), "[role='complementary']".into()]`
`visitor`	`Option<VisitorHandle>`	`None`	Optional visitor for custom traversal logic. When set, the visitor's callbacks are invoked for matching HTML elements during conversion, allowing custom output, skipping, or HTML preservation. See `HtmlVisitor`.

PreprocessingOptions¶

HTML preprocessing options for document cleanup before conversion.

Field	Type	Default	Description
`enabled`	`bool`	`true`	Enable HTML preprocessing globally
`preset`	`PreprocessingPreset`	`PreprocessingPreset::Standard`	Preprocessing preset level (Minimal, Standard, Aggressive)
`remove_navigation`	`bool`	`true`	Remove navigation elements (nav, breadcrumbs, menus, sidebars)
`remove_forms`	`bool`	`true`	Remove form elements (forms, inputs, buttons, etc.)

TableGrid¶

A structured table grid with cell-level data including spans.

Field	Type	Default	Description
`rows`	`u32`	—	Number of rows.
`cols`	`u32`	—	Number of columns.
`cells`	`Vec<GridCell>`	`vec![]`	All cells in the table as a flat, sparse list. The list is ordered by `(row, col)` but is not a dense `rows × cols` matrix: cells that are covered by a spanning cell (via `row_span > 1` or `col_span > 1`) do not appear in the list. Only the top-left "origin" cell of a span is present, with its `row_span` and `col_span` fields set accordingly. To reconstruct the full visual grid, iterate over all cells and mark the rectangular region `[row .. row+row_span, col .. col+col_span]` as occupied by that cell. Any `(row, col)` position that is not the origin of any cell is covered by a span from an earlier cell. The length of this vec is `≤ rows * cols`. An empty table (`rows == 0 \\|\\| cols == 0`) produces an empty vec.

Metadata Types¶

DocumentMetadata¶

Document-level metadata extracted from <head> and top-level elements.

Contains all metadata typically used by search engines, social media platforms, and browsers for document indexing and presentation.

Field	Type	Default	Description
`title`	`Option<String>`	`Default::default()`	Document title from `<title>` tag
`description`	`Option<String>`	`Default::default()`	Document description from `<meta name="description">` tag
`keywords`	`Vec<String>`	`vec![]`	Document keywords from `<meta name="keywords">` tag, split on commas
`author`	`Option<String>`	`Default::default()`	Document author from `<meta name="author">` tag
`canonical_url`	`Option<String>`	`Default::default()`	Canonical URL from `<link rel="canonical">` tag
`base_href`	`Option<String>`	`Default::default()`	Base URL from `<base href="">` tag for resolving relative URLs
`language`	`Option<String>`	`Default::default()`	Document language from `lang` attribute
`text_direction`	`Option<TextDirection>`	`Default::default()`	Document text direction from `dir` attribute
`open_graph`	`HashMap<String, String>`	`HashMap::new()`	Open Graph metadata (og:* properties) for social media Keys like "title", "description", "image", "url", etc.
`twitter_card`	`HashMap<String, String>`	`HashMap::new()`	Twitter Card metadata (twitter:* properties) Keys like "card", "site", "creator", "title", "description", "image", etc.
`meta_tags`	`HashMap<String, String>`	`HashMap::new()`	Additional meta tags not covered by specific fields Keys are meta name/property attributes, values are content

HeaderMetadata¶

Header element metadata with hierarchy tracking.

Captures heading elements (h1-h6) with their text content, identifiers, and position in the document structure.

Field	Type	Default	Description
`level`	`u8`	—	Header level: 1 (h1) through 6 (h6)
`text`	`String`	—	Normalized text content of the header
`id`	`Option<String>`	`None`	HTML id attribute if present
`depth`	`usize`	—	Document tree depth at the header element
`html_offset`	`usize`	—	Byte offset in original HTML document

LinkMetadata¶

Hyperlink metadata with categorization and attributes.

Represents <a> elements with parsed href values, text content, and link type classification.

Field	Type	Default	Description
`href`	`String`	—	The href URL value
`text`	`String`	—	Link text content (normalized, concatenated if mixed with elements)
`title`	`Option<String>`	`None`	Optional title attribute (often shown as tooltip)
`link_type`	`LinkType`	—	Link type classification
`rel`	`Vec<String>`	—	Rel attribute values (e.g., "nofollow", "stylesheet", "canonical")
`attributes`	`HashMap<String, String>`	—	Additional HTML attributes

ImageMetadata¶

Image metadata with source and dimensions.

Captures <img> elements and inline <svg> elements with metadata for image analysis and optimization.

Field	Type	Default	Description
`src`	`String`	—	Image source (URL, data URI, or SVG content identifier)
`alt`	`Option<String>`	`None`	Alternative text from alt attribute (for accessibility)
`title`	`Option<String>`	`None`	Title attribute (often shown as tooltip)
`dimensions`	`Vec<u32>`	`None`	Image dimensions as (width, height) if available
`image_type`	`ImageType`	—	Image type classification
`attributes`	`HashMap<String, String>`	—	Additional HTML attributes

HtmlMetadata¶

Comprehensive metadata extraction result from HTML document.

Contains all extracted metadata types in a single structure, suitable for serialization and transmission across language boundaries.

Field	Type	Default	Description
`document`	`DocumentMetadata`	—	Document-level metadata (title, description, canonical, etc.)
`headers`	`Vec<HeaderMetadata>`	`vec![]`	Extracted header elements with hierarchy
`links`	`Vec<LinkMetadata>`	`vec![]`	Extracted hyperlinks with type classification
`images`	`Vec<ImageMetadata>`	`vec![]`	Extracted images with source and dimensions
`structured_data`	`Vec<StructuredData>`	`vec![]`	Extracted structured data blocks

Document Structure¶

DocumentStructure¶

A structured document tree representing the semantic content of an HTML document.

Uses a flat node array with index-based parent/child references for efficient traversal.

Field	Type	Default	Description
`nodes`	`Vec<DocumentNode>`	—	All nodes in document reading order.
`source_format`	`Option<String>`	`None`	The source format (always "html" for this library).

DocumentNode¶

A single node in the document tree.

Field	Type	Default	Description
`id`	`String`	—	Deterministic node identifier.
`content`	`NodeContent`	—	The semantic content of this node.
`parent`	`Option<u32>`	`None`	Index of the parent node (None for root nodes).
`children`	`Vec<u32>`	`/* serde(default) */`	Indices of child nodes in reading order.
`annotations`	`Vec<TextAnnotation>`	`/* serde(default) */`	Inline formatting annotations (bold, italic, links, etc.) with byte offsets into the text.
`attributes`	`HashMap<String, String>`	`None`	Format-specific attributes preserved from the source HTML element. Keys are lowercased attribute names as they appear in the HTML (e.g. `"class"`, `"id"`, `"data-foo"`). Values are the raw attribute strings, copied verbatim from the source — no HTML entity decoding is applied here. The map is `None` when no attributes are present (omitted entirely in serialized output). Not every HTML attribute is preserved: only attributes that carry semantic or structural significance for the node type are collected. For example, heading nodes capture the `"id"` attribute for anchor linking; other element-level attributes may be silently dropped.

GridCell¶

A single cell in a table grid.

Field	Type	Default	Description
`content`	`String`	—	The text content of the cell.
`row`	`u32`	—	0-indexed row position.
`col`	`u32`	—	0-indexed column position.
`row_span`	`u32`	`/* serde(default) */`	Number of rows this cell spans (default 1).
`col_span`	`u32`	`/* serde(default) */`	Number of columns this cell spans (default 1).
`is_header`	`bool`	`/* serde(default) */`	Whether this is a header cell (`<th>`).

TableData¶

A top-level extracted table with both structured data and markdown representation.

Field	Type	Default	Description
`grid`	`TableGrid`	—	The structured table grid.
`markdown`	`String`	—	The markdown rendering of this table.

NodeContext¶

Context information passed to all visitor methods.

Provides comprehensive metadata about the current node being visited, including its type, attributes, position in the DOM tree, and parent context.

Field	Type	Default	Description
`node_type`	`NodeType`	—	Coarse-grained node type classification
`tag_name`	`String`	—	Raw HTML tag name (e.g., "div", "h1", "custom-element")
`attributes`	`HashMap<String, String>`	—	All HTML attributes as key-value pairs
`depth`	`usize`	—	Depth in the DOM tree (0 = root)
`index_in_parent`	`usize`	—	Index among siblings (0-based)
`parent_tag`	`Option<String>`	`None`	Parent element's tag name (None if root)
`is_inline`	`bool`	—	Whether this element is treated as inline vs block

Other Types¶

StructuredData¶

Structured data block (JSON-LD, Microdata, or RDFa).

Represents machine-readable structured data found in the document. JSON-LD blocks are collected as raw JSON strings for flexibility.

Field	Type	Default	Description
`data_type`	`StructuredDataType`	—	Type of structured data (JSON-LD, Microdata, RDFa)
`raw_json`	`String`	—	Raw JSON string (for JSON-LD) or serialized representation
`schema_type`	`Option<String>`	`None`	Schema type if detectable (e.g., "Article", "Event", "Product")

TextAnnotation¶

A styling or semantic annotation that applies to a byte range within a node's text.

Unlike DocumentNode, which captures block-level structure (headings, paragraphs, etc.), a TextAnnotation describes inline-level markup — bold, italic, links, code spans, and similar — that spans a contiguous run of bytes inside DocumentNode.content's text field.

Byte offsets (start..end) are into the UTF-8 encoded text of the parent node. The range follows Rust slice conventions: start is inclusive and end is exclusive, so the annotated text is text[start as usize..end as usize].

Multiple annotations on the same node can overlap (e.g. bold-italic text), and they are stored in the order they are encountered during DOM traversal.

See AnnotationKind for the full list of supported annotation types.

Field	Type	Default	Description
`start`	`u32`	—	Start byte offset (inclusive) into the parent node's text.
`end`	`u32`	—	End byte offset (exclusive) into the parent node's text.
`kind`	`AnnotationKind`	—	The type of annotation.

ProcessingWarning¶

A non-fatal diagnostic produced during HTML conversion.

Warnings indicate that conversion completed but some content may have been handled differently than expected — for example, an image that could not be extracted, a truncated input, or malformed HTML that was repaired with best-effort parsing.

Conversion always succeeds (returns ConversionResult) even when warnings are present. Callers should inspect warnings and decide how to handle them based on their tolerance for partial results:

Logging pipelines: emit each warning at WARN level and continue.
Strict pipelines: treat any warning as a hard error by checking result.warnings.is_empty() before using the output.

See WarningKind for the full taxonomy of warning categories.

Field	Type	Default	Description
`message`	`String`	—	Human-readable warning message.
`kind`	`WarningKind`	—	The category of warning.

VisitorHandle¶

Shareable, thread-safe handle to a user-provided HTML visitor implementation.

Pass an instance wrapped in this handle to ConversionOptions to customise how the HTML document is traversed and converted to Markdown. The handle may be cloned and shared across threads without additional synchronisation on the caller's side.

Opaque type — fields are not directly accessible.

HtmlVisitor¶

Visitor for HTML→Markdown conversion.

Provide a visitor object whose methods customize the conversion behavior for any HTML element type. Override only the methods you care about; unimplemented methods default to Continue (emit the standard rendering).

Each callback returns one of:

Continue (the default) — keep the standard rendering.
Skip — drop the element from the output entirely.
PreserveHtml — pass the original HTML through verbatim.
Custom(text) — replace the rendering with text.
Error(message) — abort conversion with message.

Language idioms. In Rust, return one of the VisitResult variants directly. In Python, Ruby, JavaScript/TypeScript, and other duck-typed bindings, define a plain class (no base class required) and return either a string ("continue", "skip", "preserve_html") or a tagged map ({"custom": "..."}, {"error": "..."}) — the binding converts the return value to the corresponding VisitResult variant automatically.

Method Naming Convention¶

visit_*_start: Called before entering an element (pre-order traversal)
visit_*_end: Called after exiting an element (post-order traversal)
visit_*: Called for specific element types (e.g., visit_link, visit_image)

Execution Order¶

For a typical element like <div><p>text</p></div>:

visit_element_start for <div>
visit_element_start for <p>
visit_text for "text"
visit_element_end for <p>
visit_element_end for </div>

Performance Notes¶

visit_text is the most frequently called method (~100+ times per document)
Return Continue quickly for elements you don't need to customize
Avoid heavy computation in visitor methods; consider caching if needed

Opaque type — fields are not directly accessible.

Enums¶

AnnotationKind¶

The type of an inline text annotation.

Uses internally tagged representation ("annotation_type": "bold") for JSON serialization.

Variant	Wire value	Description
`Bold`	`bold`	Bold / strong emphasis.
`Italic`	`italic`	Italic / emphasis.
`Underline`	`underline`	Underline.
`Strikethrough`	`strikethrough`	Strikethrough / deleted text.
`Code`	`code`	Inline code.
`Subscript`	`subscript`	Subscript text.
`Superscript`	`superscript`	Superscript text.
`Highlight`	`highlight`	Highlighted / marked text.
`Link`	`link`	A hyperlink sourced from an `<a href="../...">` element. — Fields: `url`: `String`, `title`: `String`

CodeBlockStyle¶

Code block fence style in Markdown output.

Determines how code blocks (<pre><code>) are rendered in Markdown.

Variant	Description
`Indented`	Indented code blocks (4 spaces). `CommonMark` standard.
`Backticks`	Fenced code blocks with backticks (```). Default (GFM). Supports language hints.
`Tildes`	Fenced code blocks with tildes (~~~). Supports language hints.

HeadingStyle¶

Heading style options for Markdown output.

Controls how headings (h1-h6) are rendered in the output Markdown.

Variant	Description
`Underlined`	Underlined style (=== for h1, --- for h2).
`Atx`	ATX style (# for h1, ## for h2, etc.). Default.
`AtxClosed`	ATX closed style (# title #, with closing hashes).

HighlightStyle¶

Highlight rendering style for <mark> elements.

Controls how highlighted text is rendered in Markdown output.

Variant	Description
`DoubleEqual`	Double equals syntax (text). Default. Pandoc-compatible.
`Html`	Preserve as HTML (text). Original HTML tag.
`Bold`	Render as bold (text). Uses strong emphasis.
`None`	Strip formatting, render as plain text. No markup.

ImageType¶

Image source classification for proper handling and processing.

Determines whether an image is embedded (data URI), inline SVG, external, or relative.

Variant	Wire value	Description
`DataUri`	`data_uri`	Data URI embedded image (base64 or other encoding)
`InlineSvg`	`inline_svg`	Inline SVG element
`External`	`external`	External image URL (http/https)
`Relative`	`relative`	Relative image path

LinkStyle¶

Link rendering style in Markdown output.

Controls whether links and images use inline [text](url) syntax or reference-style [text][1] syntax with definitions collected at the end.

Variant	Description
`Inline`	Inline links: `[text](url)`. Default.
`Reference`	Reference-style links: `[text][1]` with `[1]: url` at end of document.

LinkType¶

Link classification based on href value and document context.

Used to categorize links during extraction for filtering and analysis.

Variant	Wire value	Description
`Anchor`	`anchor`	Anchor link within same document (href starts with #)
`Internal`	`internal`	Internal link within same domain
`External`	`external`	External link to different domain
`Email`	`email`	Email link (mailto:)
`Phone`	`phone`	Phone link (tel:)
`Other`	`other`	Other protocol or unclassifiable

ListIndentType¶

List indentation character type.

Controls whether list items are indented with spaces or tabs.

Variant	Description
`Spaces`	Use spaces for indentation. Default. Width controlled by `list_indent_width`.
`Tabs`	Use tabs for indentation.

NewlineStyle¶

Line break syntax in Markdown output.

Controls how soft line breaks (from <br> or line breaks in source) are rendered.

Variant	Description
`Spaces`	Two trailing spaces at end of line. Default. Standard Markdown syntax.
`Backslash`	Backslash at end of line. Alternative Markdown syntax.

NodeContent¶

The semantic content type of a document node.

Uses internally tagged representation ("node_type": "heading") for JSON serialization.

Variant	Wire value	Description
`Heading`	`heading`	A heading element (h1-h6). — Fields: `level`: `u8`, `text`: `String`
`Paragraph`	`paragraph`	A paragraph of text. — Fields: `text`: `String`
`List`	`list`	A list container (ordered or unordered). Children are `ListItem` nodes. — Fields: `ordered`: `bool`
`ListItem`	`list_item`	A single list item. — Fields: `text`: `String`
`Table`	`table`	A table with structured cell data. — Fields: `grid`: `TableGrid`
`Image`	`image`	An image element. — Fields: `description`: `String`, `src`: `String`, `image_index`: `u32`
`Code`	`code`	A code block or inline code. — Fields: `text`: `String`, `language`: `String`
`Quote`	`quote`	A block quote container.
`DefinitionList`	`definition_list`	A definition list container.
`DefinitionItem`	`definition_item`	A definition list entry with term and description. — Fields: `term`: `String`, `definition`: `String`
`RawBlock`	`raw_block`	A raw block preserved as-is (e.g. `<script>`, `<style>` content). — Fields: `format`: `String`, `content`: `String`
`MetadataBlock`	`metadata_block`	A block of key-value metadata pairs (from `<head>` meta tags). — Fields: `entries`: `Vec<Vec<String>>`
`Group`	`group`	A section grouping container (auto-generated from heading hierarchy). — Fields: `label`: `String`, `heading_level`: `u8`, `heading_text`: `String`

NodeType¶

Node type enumeration covering all HTML element types.

This enum categorizes all HTML elements that the converter recognizes, providing a coarse-grained classification for visitor dispatch.

Variant	Description
`Text`	Text node (most frequent - 100+ per document)
`Element`	Generic element node
`Heading`	Heading elements (h1-h6)
`Paragraph`	Paragraph element
`Div`	Generic div container
`Blockquote`	Blockquote element
`Pre`	Preformatted text block
`Hr`	Horizontal rule
`List`	Ordered or unordered list (ul, ol)
`ListItem`	List item (li)
`DefinitionList`	Definition list (dl)
`DefinitionTerm`	Definition term (dt)
`DefinitionDescription`	Definition description (dd)
`Table`	Table element
`TableRow`	Table row (tr)
`TableCell`	Table cell (td, th)
`TableHeader`	Table header cell (th)
`TableBody`	Table body (tbody)
`TableHead`	Table head (thead)
`TableFoot`	Table foot (tfoot)
`Link`	Anchor link (a)
`Image`	Image (img)
`Strong`	Strong/bold (strong, b)
`Em`	Emphasis/italic (em, i)
`Code`	Inline code (code)
`Strikethrough`	Strikethrough (s, del, strike)
`Underline`	Underline (u, ins)
`Subscript`	Subscript (sub)
`Superscript`	Superscript (sup)
`Mark`	Mark/highlight (mark)
`Small`	Small text (small)
`Br`	Line break (br)
`Span`	Span element
`Article`	Article element
`Section`	Section element
`Nav`	Navigation element
`Aside`	Aside element
`Header`	Header element
`Footer`	Footer element
`Main`	Main element
`Figure`	Figure element
`Figcaption`	Figure caption
`Time`	Time element
`Details`	Details element
`Summary`	Summary element
`Form`	Form element
`Input`	Input element
`Select`	Select element
`Option`	Option element
`Button`	Button element
`Textarea`	Textarea element
`Label`	Label element
`Fieldset`	Fieldset element
`Legend`	Legend element
`Audio`	Audio element
`Video`	Video element
`Picture`	Picture element
`Source`	Source element
`Iframe`	Iframe element
`Svg`	SVG element
`Canvas`	Canvas element
`Ruby`	Ruby annotation
`Rt`	Ruby text
`Rp`	Ruby parenthesis
`Abbr`	Abbreviation
`Kbd`	Keyboard input
`Samp`	Sample output
`Var`	Variable
`Cite`	Citation
`Q`	Quote
`Del`	Deleted text
`Ins`	Inserted text
`Data`	Data element
`Meter`	Meter element
`Progress`	Progress element
`Output`	Output element
`Template`	Template element
`Slot`	Slot element
`Html`	HTML root element
`Head`	Head element
`Body`	Body element
`Title`	Title element
`Meta`	Meta element
`LinkTag`	Link element (not anchor)
`Style`	Style element
`Script`	Script element
`Base`	Base element
`Custom`	Custom element (web components) or unknown tag

OutputFormat¶

Output format for conversion.

Specifies the target markup language format for the conversion output.

Variant	Description
`Markdown`	Standard Markdown (CommonMark compatible). Default.
`Djot`	Djot lightweight markup language.
`Plain`	Plain text output (no markup, visible text only).

PreprocessingPreset¶

HTML preprocessing aggressiveness level.

Controls the extent of cleanup performed before conversion. Higher levels remove more elements.

Variant	Description
`Minimal`	Minimal cleanup. Remove only essential noise (scripts, styles).
`Standard`	Standard cleanup. Default. Removes navigation, forms, and other auxiliary content.
`Aggressive`	Aggressive cleanup. Remove extensive non-content elements and structure.

StructuredDataType¶

Structured data format type.

Identifies the schema/format used for structured data markup.

Variant	Wire value	Description
`JsonLd`	`json_ld`	JSON-LD (JSON for Linking Data) script blocks
`Microdata`	`microdata`	HTML5 Microdata attributes (itemscope, itemtype, itemprop)
`RDFa`	`rdfa`	RDF in Attributes (RDFa) markup

TextDirection¶

Text directionality of document content.

Corresponds to the HTML dir attribute and bdi element directionality.

Variant	Wire value	Description
`LeftToRight`	`ltr`	Left-to-right text flow (default for Latin scripts)
`RightToLeft`	`rtl`	Right-to-left text flow (Hebrew, Arabic, Urdu, etc.)
`Auto`	`auto`	Automatic directionality detection

VisitResult¶

Result of a visitor callback.

Allows visitors to control the conversion flow by either proceeding with default behavior, providing custom output, skipping elements, preserving HTML, or signaling errors.

Variant	Description
`Continue`	Continue with default conversion behavior
`Custom`	Replace default output with custom markdown The visitor takes full responsibility for the markdown output of this node and its children. — Fields: `_0`: `String`
`Skip`	Skip this element entirely (don't output anything) The element and all its children are ignored in the output.
`PreserveHtml`	Preserve original HTML (don't convert to markdown) The element's raw HTML is included verbatim in the output.
`Error`	Stop conversion with an error The conversion process halts and returns this error message. — Fields: `_0`: `String`

WarningKind¶

Categories of processing warnings.

Variant	Wire value	Description
`ImageExtractionFailed`	`image_extraction_failed`	An image could not be extracted (e.g. invalid data URI, unsupported format).
`EncodingFallback`	`encoding_fallback`	The input encoding was not recognized; fell back to UTF-8.
`TruncatedInput`	`truncated_input`	The input was truncated due to size limits.
`MalformedHtml`	`malformed_html`	The HTML was malformed but processing continued with best effort.
`SanitizationApplied`	`sanitization_applied`	Sanitization was applied to remove potentially unsafe content.
`DepthLimitExceeded`	`depth_limit_exceeded`	DOM traversal was truncated because max_depth was exceeded.

WhitespaceMode¶

Whitespace handling strategy during conversion.

Determines how sequences of whitespace characters (spaces, tabs, newlines) are processed.

Variant	Description
`Normalized`	Collapse multiple whitespace characters to single spaces. Default. Matches browser behavior.
`Strict`	Preserve all whitespace exactly as it appears in the HTML.

Edit this page on GitHub