python - How to parse unicode strings with minidom? -


I am trying to parse a group of XML data with the library xml.dom.minidom, extracting some data To put it, most XML runs in a text file, but for some of them, I get the following error when I say MediaMompSressing ():

Unicode encoding error: ' ASCI can not encode 'codec character' in case of 51 9: No. That is not a number (128)

It is also to some other non-ASCII characters are my questions: What are my options here? Do I want to strip / replace all those non-English letters in any way before being able to parse XML files? "post-text" itemprop = "text">

Try decoding it:

  & gt; Print U'baked '' Encoded ('UTF-8') & gt; Abcdà ?? One © & gt; Print U'baked '' Encoded ('UTF-8'). Decode ('UTF-8') & gt; ABCDA    

Comments