Parser chokes when text contains slash

I am using html_diff in an application that compares two XML files and writes a report with the differences: the text content of each node is extracted from the two files using xml.dom.minidom, and then compared with diff(string1, string2).

It works well in all cases, but we have found that it fails with the following message if the text contains a slash (i.e. "/"):

.../pisa25-diff-target-xml/venv/lib/python3.11/site-packages/html_diff/init.py:205: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup. a_soup = bs4.BeautifulSoup(a, "html.parser")

Attached is a sample file sample_ar-PS.xml

Here is the application pisa25-diff-target-xml for reference.

Edited Feb 29, 2024 by Manuel Souto Pico