Ruby API Reference v2.5.1¶
Gem: html-to-markdown | Version: 2.28.1 | Ruby: 3.2+
Installation¶
Or in your Gemfile:
Module Methods¶
All methods are available on the HtmlToMarkdown module.
HtmlToMarkdown.convert¶
Convert HTML to Markdown.
Parameters:
| Parameter | Type | Description |
|---|---|---|
html | String | The HTML string to convert |
options | Hash, nil | Optional conversion options as a Hash |
visitor | Object, nil | Optional visitor for custom conversion logic |
Returns: String -- the converted Markdown.
Raises: StandardError if conversion fails.
Example:
require 'html_to_markdown'
html = "<h1>Hello</h1><p>World</p>"
markdown = HtmlToMarkdown.convert(html)
# With options
markdown = HtmlToMarkdown.convert(html, { heading_style: "atx" })
# With visitor
class SkipImages
def visit_image(ctx, src, alt, title)
{ "type" => "skip" }
end
end
markdown = HtmlToMarkdown.convert(html, nil, SkipImages.new)
HtmlToMarkdown.convert_with_metadata¶
Convert HTML to Markdown with metadata extraction.
Parameters:
| Parameter | Type | Description |
|---|---|---|
html | String | The HTML string to convert |
options | Hash, nil | Optional conversion options |
metadata_config | Hash, nil | Metadata extraction configuration |
Returns: Array(String, Hash) -- a two-element array of [markdown_string, metadata_hash].
Example:
html = <<~HTML
<html lang="en">
<head><title>My Article</title></head>
<body>
<h1 id="intro">Introduction</h1>
<p>Visit <a href="https://example.com">our site</a></p>
<img src="photo.jpg" alt="Landscape">
</body>
</html>
HTML
markdown, metadata = HtmlToMarkdown.convert_with_metadata(html)
puts metadata[:document][:title] # => "My Article"
puts metadata[:document][:language] # => "en"
puts metadata[:headers].length # => 1
puts metadata[:headers][0][:text] # => "Introduction"
puts metadata[:links].length # => 1
puts metadata[:images].length # => 1
# Selective extraction
config = {
extract_headers: true,
extract_links: true,
extract_images: false,
extract_structured_data: false,
}
markdown, metadata = HtmlToMarkdown.convert_with_metadata(html, nil, config)
HtmlToMarkdown.convert_with_visitor¶
Convert HTML with a visitor object. The visitor is passed as the third argument to convert.
HtmlToMarkdown.options¶
Create a reusable, pre-compiled options handle.
HtmlToMarkdown.convert_with_options¶
Convert using a pre-compiled options handle.
Example:
handle = HtmlToMarkdown.options({ heading_style: "atx", wrap: true })
html_documents.each do |html|
markdown = HtmlToMarkdown.convert_with_options(html, handle)
end
Options Hash¶
Options are passed as a Ruby Hash with symbol keys. All fields are optional.
options = {
heading_style: "atx", # "underlined", "atx", "atx_closed"
list_indent_type: "spaces", # "spaces", "tabs"
list_indent_width: 4,
bullets: "-",
strong_em_symbol: "*",
escape_asterisks: false,
escape_underscores: false,
escape_misc: false,
code_language: "",
autolinks: true,
whitespace_mode: "normalized", # "normalized", "strict"
strip_newlines: false,
wrap: false,
wrap_width: 80,
newline_style: "spaces", # "spaces", "backslash"
code_block_style: "indented", # "indented", "backticks", "tildes"
preserve_tags: [],
strip_tags: [],
skip_images: false,
output_format: "markdown", # "markdown", "djot", "plain"
}
See the Configuration Reference for detailed descriptions.
Metadata Config Hash¶
metadata_config = {
extract_document: true,
extract_headers: true,
extract_links: true,
extract_images: true,
extract_structured_data: true,
max_structured_data_size: 1_000_000,
}
Metadata Result Structure¶
The metadata Hash returned by convert_with_metadata has the following structure:
{
document: {
title: String | nil,
description: String | nil,
keywords: Array<String>,
author: String | nil,
language: String | nil,
text_direction: String | nil, # "ltr", "rtl", "auto"
canonical_url: String | nil,
open_graph: Hash<String, String>,
twitter_card: Hash<String, String>,
meta_tags: Hash<String, String>,
},
headers: [
{ level: Integer, text: String, id: String | nil, depth: Integer, html_offset: Integer },
],
links: [
{ href: String, text: String, title: String | nil, link_type: String, rel: Array<String>, attributes: Hash },
],
images: [
{ src: String, alt: String | nil, title: String | nil, dimensions: Array | nil, image_type: String, attributes: Hash },
],
structured_data: [
{ data_type: String, raw_json: String, schema_type: String | nil },
],
}
Visitor Protocol¶
Visitor objects are plain Ruby objects. Define methods matching the callbacks you need. Each method should return a Hash with a "type" key.
class MyVisitor
def visit_link(ctx, href, text, title)
{ "type" => "custom", "output" => "#{text} (#{href})" }
end
def visit_image(ctx, src, alt, title)
{ "type" => "skip" }
end
end
See the Visitor Pattern Guide for available callbacks.
See Also¶
- Configuration Reference -- full options documentation
- Types Reference -- cross-language type definitions