I am trying to parse a group of XML data with the library xml.dom.minidom, extracting some data To put it, most XML runs in a text file, but for some of them, I get the following error when I say MediaMompSressing ():
Unicode encoding error: ' ASCI can not encode 'codec character' in case of 51 9: No. That is not a number (128)
It is also to some other non-ASCII characters are my questions: What are my options here? Do I want to strip / replace all those non-English letters in any way before being able to parse XML files? "post-text" itemprop = "text">
Try decoding it:
& gt; Print U'baked '' Encoded ('UTF-8') & gt; AbcdĂ ?? One © & gt; Print U'baked '' Encoded ('UTF-8'). Decode ('UTF-8') & gt; ABCDA
Comments
Post a Comment