dcc_json_toolkit.dcc_source_content

Handle for XML files and their content (as text).

DccSourceContent

DccSourceContent(xml_source: str | Path)

Container for any complete or partial DCC file.

The goal of the class is to read the file only once, properly storing in memory its contents allowing further processing on them.

The class does not provide any inspection over the file and assumes any XML provided data corresponds to DCC data.

Parameters:
  • xml_source (str | Path) –

    The source of the DCC, which can be either the file path to the DCC or a string with its contents.

content property

content: str

The contents of the file.

is_subschema property

is_subschema: bool

Whether the DCC contents correlate to a partial DCC schema.

get_first_tag

get_first_tag() -> str | None

Extract the name of the first tag found in the document.

The function returns only the name of the tag, not considering the namespace. E.g.: dcc:list -> 'list', si:real -> 'real'. If no valid tag is found, the function returns None.

get_namespaces

get_namespaces() -> dict[str, str]

Extracts all namespaces included in the content.

get_schema_locations

get_schema_locations() -> list[tuple[str, str]]

Extracts a list of all the schema locations.

A valid XML might include any XSDs, which should be defined within the attribute schemaLocation as xsi:schemaLocation="{dcc_link} {dcc_xsd_link} {ns_link} {xsd_link}". This function searches for these schemas, returning a list to all the XSD links.

Returns:
  • schema_locations( list[tuple[str, str]] ) –

    A list where each item corresponds first of the namespace, then the URI of the schema. For example, a DCC without any other schema locations would return: [("https://ptb.de/dcc", "https://ptb.de/dcc/v3.3.0/dcc.xsd")]

get_schema_version

get_schema_version() -> str | None

Extract the schema version of the document.

The version is specified at the attribute schemaVersion and only at complete DCC files. Any subschema will return None as the version is not specified in it.