Skip to content

Markdown

This section contains the reference for the implementation of translate-md's MarkdownProcessor and helper functions.

MarkdownProcessor

MarkdownProcessor(markdown_content: str) -> None

Class that allows to work with a markdown file, extracting the text content to be translated.

The expected format of the markdown file is the one used in hugo for blogging.

Parameters:

Name Type Description Default
markdown_content str

The content of a markdown file as a string.

required
Notes

See gohugo for type of markdown files

tokens property

tokens: list[Token]

Parsed pieces of the markdown file. The content will be extracted from these pieces, updated and created back.

get_pieces

get_pieces() -> list[str]

Gets the pieces of the markdown file to be translated.

The relevant pieces are those tokens considered of type 'inline' and which aren't the front matter, a figure, code or markdown comments.

Internally stores the position of the corresponding tokens for later use.

render

render() -> str

Get a new markdown file with the paragraphs translated.

Parameters:

Name Type Description Default
texts list[str]

List of texts to insert back to the

required

update

update(texts: list[str]) -> None

Update the content with the translated pieces.

Parameters:

Name Type Description Default
texts list[str]

List of texts to insert back to the

required

Raises:

Type Description
ValueError

If the number of texts to update don't match the number of texts obtained from get_pieces method.

See Also

get_pieces

write_to

write_to(filename: Path) -> None

Write the content of the updated markdown to disk.

Parameters:

Name Type Description Default
filename Path

Name of the new file.

required

read_file

read_file(filename: Path) -> str

Read a whole markdown file to a string, just a helper function.

is_front_matter

is_front_matter(text: str) -> bool

Check if a token pertains to the front matter.

The check seeks if the string starts with '---' and the word title after a single line jump (it will fail if some space is inserted between them), and ends with '---'.

Parameters:

Name Type Description Default
text str

text obtained in the Token's content. Expects to be applied to the tokens from a markdown parsed.

required

Returns:

Type Description
bool

bool

is_figure

is_figure(text: str) -> bool

Check if a paragraph is just a picture in the doc.

Some lines may contain just a picture, and there is no reason to translate those. i.e. 'helpner' The type of check is not perfect, it just fits my needs.

Parameters:

Name Type Description Default
text str

text obtained in the Token's content.

required

Returns:

Name Type Description
bool bool

is_code

is_code(text: str) -> bool

Check if a blob of text is a chunk of code.

Parameters:

Name Type Description Default
text str

text obtained in the Token's content.

required

Returns:

Type Description
bool

bool

is_comment

is_comment(text: str) -> bool