UnicodeDecodeError:'utf-8'编解码器无法解码字节
我试图想把一个对象转换成json,但是一直报错
json_txt = json.dumps(log_txt)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte
原因是对象中有一些乱码字符,utf-8无法解码这些字符
解决方法:
json_txt = json.dumps(log_txt,ensure_ascii=False)
加一个参数,原因以后再探究吧
################################分割线#########################################
几天后,探究一下原因,找了下手册
json.
dump
(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True,
cls=None, indent=None, separators=None, encoding="utf-8", default=None, sort_keys=False, **kw)
Serialize obj as a JSON formatted stream to fp (a .write()
-supporting file-like object) using this conversion table.
If skipkeys is true (default: False
), then dict keys that are not of a basic type (str
, unicode
, int
, long
, float
, bool
, None
) will be skipped instead of raising a TypeError
.
If ensure_ascii is true (the default), all non-ASCII characters in the output are escaped with \uXXXX
sequences, and the result is a str
instance consisting of ASCII characters only. If ensure_ascii is false, some chunks written to fp may be unicode
instances. This usually happens because the input contains unicode strings or the encoding parameter is used. Unless fp.write()
explicitly understands unicode
(as in codecs.getwriter()
) this is likely to cause an error.
翻译一下最后一段,如果ensure_ascii为True(默认值),输出的非ASCII字符都会使用\xxx转义,dumps返回值是str且仅包含ASCII字符;
如果ensure_ascii为False,( some chunks written to fp may be unicode
instances这句没看懂,好像是说fp应该是unicode实例)
因为输入包含unicode字符串或使用了encoding编码参数。除非fp.write()
是unicode
(如codecs.getwriter()
),否则这可能会导致错误。
也就是说我的错误是因为编码问题引起的,我要json化的数据含有一些字符(如0xe9) ,如果使用默认值,ensure_ascii=true,按照ascii编码是无法解析的,指定ensure_ascii=False,会原样输出
举个栗子
print(json.dumps("你好")) print(json.dumps("你好",ensure_ascii=False)) 输出"\u4f60\u597d" 输出"你好"