World Library  
Flag as Inappropriate
Email this Article

8-bit Clean

Article Id: WHEBN0000060373
Reproduction Date:

Title: 8-bit Clean  
Author: World Heritage Encyclopedia
Language: English
Subject: Character encoding, Base64, JED (text editor), POSIX terminal interface, Portable Network Graphics
Publisher: World Heritage Encyclopedia

8-bit Clean

8-bit clean describes a computer system that correctly handles 8-bit character encodings, such as the ISO 8859 series and the UTF-8 encoding of Unicode.


Up to the early 1990s, many programs and data transmission channels assumed that all characters would be represented as numbers between 0 and 127 (7 bits). On computers and data links using 8-bit bytes this left the top bit of each byte free for use as a parity, flag bit, or meta data control bit. 7-bit systems and data links are unable to handle more complex character codes which are commonplace in non-English-speaking countries with larger alphabets.

Binary files cannot be transmitted through 7-bit data channels directly. To work around this, binary-to-text encodings have been devised which use only 7-bit ASCII characters. Some of these encodings are uuencoding, Ascii85, SREC, BinHex, kermit and MIME's Base64. EBCDIC-based systems cannot handle all characters used in UUencoded data. However, the base64 encoding does not have this problem.

SMTP and NNTP 8-bit cleanness

Historically, various media were used to transfer messages, some of them only supporting 7-bit data, so an 8-bit message had high chances to be garbled during transmission in the 20th century. But some implementations really did not care about formal discouraging of 8-bit data and allowed high bit set bytes to pass through.

Many early communications protocol standards, such as RFC 780, RFC 788, RFC 821 for SMTP, RFC 977 for NNTP, RFC 1056, RFC 2821, RFC 5321, were designed to work over such "7-bit" communication links. They specifically mention the use of ASCII character set "transmitted as a 8-bit byte with the high-order bit cleared to zero" and some of these[1] explicitly restrict all data to 7-bit characters.

For the first few decades of email networks (1971 to the early 1990s), most email messages were plain text in the 7-bit US-ASCII character set. [2]

According to RFC 1428, the original RFC 821 definition of SMTP limits Internet Mail to lines (1000 characters or less) of 7-bit US-ASCII characters.[3][4][5]

Later the format of email messages was re-defined in order to support messages that are not entirely US-ASCII text (text messages in character sets other than US-ASCII, and non-text messages, such as audio and images). [5]

The Internet community generally adds features by "extension", allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be "broken" and insisting that all software world-wide be upgraded to the latest standard. In the mid-1990s, people objected to "just send 8 bits (to RFC 821 SMTP servers)", perhaps because of a perception that "just send 8 bits" is an implicit declaration that ISO 8859-1 become the new "standard encoding", forcing everyone in the world to use the same character set. Instead, the recommended way to take advantage of 8-bit-clean links between machines is to use the ESMTP (RFC 1869) 8BITMIME extension. [6][7] Despite this, some Mail Transfer Agents, notably Exim and qmail, relay mail to servers that do not advertise 8BITMIME without performing the conversion to 7-bit MIME (typically quoted-printable, "Q-P conversion") required by RFC 6152. This "just-send-8" attitude does not in fact cause problems in practice, since virtually all modern email servers are 8-bit clean.[8]

See also


  1. ^ RFC 780: Appendix A, RFC 788: 4.5.2., RFC 821: Appendix B, RFC 1056: 4.
  2. ^ John Beck. "Email Explained". 2011.
  3. ^ RFC 1428: "SMTP as defined in RFC 821 limits the sending of Internet Mail to US-ASCII characters."
  4. ^ Dan Sugalski. "E-mail with Attachments". "The Perl Journal". Summer 1999. "When mail was standardized way back in 1982 with RFC822, ... The only limits placed on the body were the character set (7-bit ASCII) and the maximum line length (1000 characters)."
  5. ^ a b RFC 2045 "Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages"
  6. ^  
  7. ^ """comp.mail.mime FAQ, part 3 "What's ESMTP, and how does it affect MIME?.  
  8. ^

This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from World Library are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.