R API Reference v2.25.2¶
Package: htmltomarkdown | Version: 2.28.1 | R: 4.3+
Installation¶
The package uses extendr to bind to the Rust core library.
Functions¶
convert¶
Convert HTML to Markdown.
Arguments:
| Parameter | Type | Description |
|---|---|---|
html | character | A character string of HTML content |
Returns: character -- the converted Markdown string.
Example:
library(htmltomarkdown)
html <- "<h1>Hello</h1><p>World</p>"
markdown <- convert(html)
cat(markdown)
convert_with_options¶
Convert HTML to Markdown with options provided as a named list.
Arguments:
| Parameter | Type | Description |
|---|---|---|
html | character | HTML content |
options | list | Named list of conversion options |
Returns: character -- the converted Markdown string.
Example:
options <- list(
heading_style = "atx",
wrap = TRUE,
wrap_width = 80
)
markdown <- convert_with_options(html, options)
convert_with_metadata¶
Convert HTML to Markdown and extract document metadata.
Arguments:
| Parameter | Type | Description |
|---|---|---|
html | character | HTML content |
options | list or NULL | Optional conversion options |
config | list or NULL | Optional metadata extraction configuration |
Returns: list with markdown and metadata elements.
Example:
html <- '<html lang="en"><head><title>Article</title></head>
<body><h1>Title</h1><a href="https://example.com">Link</a></body></html>'
result <- convert_with_metadata(html)
cat(result$markdown)
print(result$metadata$document$title) # "Article"
print(length(result$metadata$headers)) # 1
print(length(result$metadata$links)) # 1
# Selective extraction
config <- list(
extract_headers = TRUE,
extract_links = TRUE,
extract_images = FALSE
)
result <- convert_with_metadata(html, config = config)
convert_with_inline_images¶
Convert HTML and extract inline images.
Returns: list with markdown, images, and warnings elements.
convert_with_visitor¶
Convert HTML with a visitor object (reserved for future use).
create_options_handle¶
Create a reusable options handle for repeated conversions.
convert_with_options_handle¶
Convert using a pre-created options handle.
Example:
handle <- create_options_handle(list(heading_style = "atx"))
for (html in html_documents) {
markdown <- convert_with_options_handle(html, handle)
}
version¶
Get the version of the html-to-markdown Rust core.
Options List¶
Options are passed as named R lists. All fields are optional.
options <- list(
heading_style = "atx", # "underlined", "atx", "atx_closed"
list_indent_type = "spaces", # "spaces", "tabs"
list_indent_width = 2,
bullets = "-",
code_block_style = "indented", # "indented", "backticks", "tildes"
whitespace_mode = "normalized", # "normalized", "strict"
wrap = FALSE,
wrap_width = 80,
newline_style = "spaces", # "spaces", "backslash"
preserve_tags = c(),
strip_tags = c(),
skip_images = FALSE,
output_format = "markdown" # "markdown", "djot", "plain"
)
See the Configuration Reference for detailed descriptions.
Metadata Config List¶
config <- list(
extract_document = TRUE,
extract_headers = TRUE,
extract_links = TRUE,
extract_images = TRUE,
extract_structured_data = TRUE,
max_structured_data_size = 1000000
)
Extendr Details¶
The R binding uses extendr to compile Rust code into a shared library loaded by R:
- Rust source in
packages/r/src/rust/ - Compiled during package installation
- R wrappers auto-generated in
R/extendr-wrappers.R - All errors surfaced as R conditions
See Also¶
- Configuration Reference -- full options documentation
- Types Reference -- cross-language type definitions