I recently ran into this error uploading images to AWS S3 using the boto3 package.

Parameter validation failed:
Non ascii characters found in S3 metadata for key “filename”, value: “ACME™ Anvil.jpg”.
S3 metadata can only contain ASCII characters.

The character in question was ™. Here’s a simplified example of the code.

filename = "ACME™ Anvil.jpg"
metadata: dict[str, str] = {
    "filename": filename,
}
client.upload_fileobj(
    Fileobj=...,
    Bucket=...,
    Key=...,
    ExtraArgs={"Metadata": metadata}
)

I wanted to preserve the original filename, so I learnt to use backslashreplace when encoding the filename to ASCII. Note, I have to subsequently use .decode() as both metadata keys and values must be str, not bytes.

metadata: dict[str, str] = {
    "filename": filename.encode("ascii", "backslashreplace").decode(),
}

The encoded result is now compatible with the S3 metadata requirements.

>>> filename.encode("ascii", "backslashreplace").decode()
'ACME\\u2122 Anvil.jpg'

And I could use unicode-escape to retrieve the original filename later.

>>> "ACME\\u2122 Anvil.jpg".encode("ascii").decode("unicode-escape")
'ACME™ Anvil.jpg'

At first, I thought this didn’t work for all Unicode characters, namely emoji ZWJ sequences which use U+200D to join the characters into a single glyph, for example.

>>> "😵‍💫".encode("ascii", "backslashreplace").decode()
'\\U0001f635\\u200d\\U0001f4ab'

>>> "\\U0001f635\\u200d\\U0001f4ab".encode("ascii").decode("unicode-escape")
'😵\u200d💫'

Thanks to Lawrence Hudson they pointed out that it was the REPL showing the printable representation of the result rather than the Unicode string. Using print() on the result highlighted this.

>>> print("\\U0001f635\\u200d\\U0001f4ab".encode("ascii").decode("unicode-escape"))
😵‍💫