Visitor Pattern v2.23.0¶
The visitor pattern allows you to customize how specific HTML elements are converted to Markdown. Instead of modifying the library itself, you provide callback functions that intercept elements during conversion and return custom output.
What Is the Visitor Pattern?¶
In html-to-markdown, a visitor is an object (or set of callbacks) that receives notifications as the conversion engine encounters HTML elements. For each element type, you can implement a handler that decides what to do:
- Continue -- use the default conversion logic
- Custom -- replace the default output with your own Markdown
- Skip -- omit the element entirely from output
- Preserve HTML -- keep the raw HTML in the Markdown output
This gives you fine-grained control over conversion behavior without forking the library or writing a full converter from scratch.
How It Works in html-to-markdown¶
During DOM traversal, the conversion engine checks whether a visitor is registered. When an element matches a visitor callback, the engine:
- Builds a NodeContext with element metadata (tag name, attributes, depth, parent info)
- Calls the appropriate visitor method (e.g.,
visit_link,visit_heading,visit_image) - Inspects the return value to determine output behavior
- Falls back to default conversion if no callback is registered for that element type
sequenceDiagram
participant Engine as Conversion Engine
participant Visitor as User Visitor
Engine->>Engine: Encounter <a href="...">
Engine->>Visitor: visit_link(context, href, text, title)
Visitor-->>Engine: Custom("[text](modified-url)")
Engine->>Engine: Use custom output
Engine->>Engine: Encounter <h1>
Engine->>Visitor: visit_heading(context, level, text, id)
Visitor-->>Engine: Continue
Engine->>Engine: Use default "# text" output Visit Result Types¶
Every visitor callback returns one of these result types:
| Result | Behavior |
|---|---|
Continue | Proceed with the default conversion for this element |
Custom(markdown) | Replace the element's output with the provided Markdown string |
Skip | Remove the element entirely from output |
PreserveHtml | Include the raw HTML verbatim in the Markdown |
Error(reason) | Stop conversion and return an error |
NodeContext¶
All visitor callbacks receive a context object with metadata about the current element:
| Field | Type | Description |
|---|---|---|
node_type | enum | Coarse classification (link, heading, image, etc.) |
tag_name | string | Raw HTML tag name (a, h1, img, etc.) |
attributes | map | HTML attributes as key-value pairs |
depth | integer | Nesting depth in the DOM tree |
index_in_parent | integer | Zero-based sibling index |
parent_tag | string/null | Parent element's tag name |
is_inline | boolean | Whether the element is treated as inline |
Support Matrix¶
Not all language bindings support the visitor pattern. The table below shows current support:
| Binding | Visitor | Async Visitor | Best For |
|---|---|---|---|
| Rust | Yes | Yes (Tokio) | Core library, maximum performance |
| Python | Yes | Yes (asyncio) | Server-side processing, data pipelines |
| TypeScript | Yes | Yes (Promise) | Node.js / Bun server-side, SSR |
| Ruby | Yes | No | Rails, Sinatra, content management |
| PHP | Yes | No | WordPress, Laravel, CMS platforms |
| Elixir | Yes | No | Phoenix, high-concurrency services |
| R | No | -- | Statistical analysis, basic conversion |
| C | Yes | No | Embedded, system-level integration |
| Go | No | -- | Basic conversion, microservices |
| Java | No | -- | Basic conversion, enterprise apps |
| C# | No | -- | Basic conversion, .NET applications |
| WASM | No | -- | Browser/edge, Cloudflare Workers |
Async visitor support
Async visitors allow callback functions to perform I/O operations (HTTP requests, database lookups, file reads) during conversion. They are available in Python (via asyncio), TypeScript (via Promise), and Rust (via Tokio).
Available Callbacks¶
Visitor implementations can override any combination of these callbacks. Unimplemented callbacks default to Continue (standard conversion).
Text and Formatting¶
| Callback | HTML Element(s) |
|---|---|
visit_text | Text nodes |
visit_strong | <strong>, <b> |
visit_emphasis | <em>, <i> |
visit_strikethrough | <s>, <del>, <strike> |
visit_underline | <u>, <ins> |
visit_subscript | <sub> |
visit_superscript | <sup> |
visit_mark | <mark> |
Links and Media¶
| Callback | HTML Element(s) |
|---|---|
visit_link | <a> |
visit_image | <img> |
visit_audio | <audio> |
visit_video | <video> |
visit_iframe | <iframe> |
Code¶
| Callback | HTML Element(s) |
|---|---|
visit_code_block | <pre><code> |
visit_code_inline | <code> |
Structure¶
| Callback | HTML Element(s) |
|---|---|
visit_heading | <h1> through <h6> |
visit_blockquote | <blockquote> |
visit_horizontal_rule | <hr> |
visit_line_break | <br> |
Lists¶
| Callback | HTML Element(s) |
|---|---|
visit_list_start | <ul>, <ol> |
visit_list_item | <li> |
visit_list_end | End of <ul>, <ol> |
Tables¶
| Callback | HTML Element(s) |
|---|---|
visit_table_start | <table> |
visit_table_row | <tr> |
visit_table_end | End of <table> |
Generic Hooks¶
| Callback | Description |
|---|---|
visit_element_start | Called before entering any element |
visit_element_end | Called after exiting any element |
visit_custom_element | Web components or unknown tags |
Common Use Cases¶
Content Filtering¶
Remove unwanted elements from conversion output:
URL Rewriting¶
Transform URLs during conversion (e.g., converting relative paths to absolute, adding tracking parameters, or switching CDN domains):
Domain-Specific Dialects¶
Generate non-standard Markdown for specific platforms:
- Slack/Discord: Custom emoji syntax, mention formatting
- Confluence: Wiki-style links and macros
- Obsidian: Internal link syntax
[[page]] - MDX: JSX component embedding
Content Enrichment¶
Add metadata or annotations during conversion:
visit_heading(ctx, level, text, id):
anchor = generate_anchor(text)
return Custom("#{level} {text} {{#{anchor}}}")
Security Filtering¶
Sanitize or rewrite potentially dangerous content:
Further Reading¶
- Visitor Pattern Guide -- step-by-step implementation guide with code examples
- Conversion Pipeline -- how visitors fit into the overall pipeline
- Configuration Options -- non-visitor configuration options