UnicodeDecodeError：'utf-8'编解码器无法解码字节

薄洪涛7年前 (2018-12-29)Python1407

我试图想把一个对象转换成json，但是一直报错

json_txt = json.dumps(log_txt)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

原因是对象中有一些乱码字符，utf-8无法解码这些字符

解决方法：

json_txt = json.dumps(log_txt,ensure_ascii=False)

加一个参数,原因以后再探究吧

################################分割线#########################################

几天后，探究一下原因，找了下手册

json.dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True,

cls=None, indent=None, separators=None, encoding="utf-8", default=None, sort_keys=False, **kw)

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object) using this conversion table.

If skipkeys is true (default: False), then dict keys that are not of a basic type (str, unicode, int, long, float, bool, None) will be skipped instead of raising a TypeError.

If ensure_ascii is true (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the result is a str instance consisting of ASCII characters only. If ensure_ascii is false, some chunks written to fp may be unicode instances. This usually happens because the input contains unicode strings or the encoding parameter is used. Unless fp.write() explicitly understands unicode (as in codecs.getwriter()) this is likely to cause an error.

翻译一下最后一段，如果ensure_ascii为True（默认值）,输出的非ASCII字符都会使用\xxx转义，dumps返回值是str且仅包含ASCII字符；

如果ensure_ascii为False,( some chunks written to fp may be unicode instances这句没看懂,好像是说fp应该是unicode实例)

因为输入包含unicode字符串或使用了encoding编码参数。除非fp.write() 是unicode（如codecs.getwriter()），否则这可能会导致错误。

也就是说我的错误是因为编码问题引起的，我要json化的数据含有一些字符（如0xe9) ，如果使用默认值，ensure_ascii=true，按照ascii编码是无法解析的，指定ensure_ascii=False,会原样输出

举个栗子

print(json.dumps("你好"))
print(json.dumps("你好",ensure_ascii=False))

输出"\u4f60\u597d"
输出"你好"

返回列表

上一篇：chmod引出的问题

下一篇：PostgreSQL教程之安装连接

码农日记

UnicodeDecodeError：'utf-8'编解码器无法解码字节

相关文章

python爬虫之字体反爬及解决方案

Python3使用logging模块记录日志

python爬虫第一篇之环境的搭建

python爬虫第二篇之安居客

UTC转标准时间和时间戳

Python之为世界贡献你的轮子

发表评论

版权所有 | 转载请标明出处

Powered By Z-BlogPHP. Theme by TOYEAN.

码农日记

UnicodeDecodeError：'utf-8'编解码器无法解码字节

相关文章

python爬虫之字体反爬及解决方案

Python3使用logging模块记录日志

python爬虫第一篇之环境的搭建

python爬虫第二篇之安居客

UTC转标准时间和时间戳

Python之为世界贡献你的轮子

发表评论 取消回复

版权所有 | 转载请标明出处

Powered By Z-BlogPHP. Theme by TOYEAN.

发表评论