Base64是一种使用64基的位置计数法。它使用2的最大次方来代表仅可打印的ASCII 字符。这使它可用来作为电子邮件的传输编码。在Base64中的变量使用字符A-Z、a-z和0-9 ,这样共有62个字符,用来作为开始的64个数字,最后两个用来作为数字的符号在不同的系统中而不同。一些如uuencode的其他编码方法,和之后binhex的版本使用不同的64字符集来代表6个二进制数字,但是它们不叫Base64。
MIME
在MIME格式的电子邮件中,base64可以用来将binary的字节序列数据编码成ASCII字符序列构成的文本。使用时,在传输编码方式中指定base64。使用的字符包括大小写字母各26个,加上10个数字,和加号「+」,斜杠「/」,一共64个字符,等号「=」用来作为后缀用途。
完整的base64定义可见 RFC1421和 RFC2045。编码后的数据比原始数据略长,为原来的4/3。在电子邮件中,根据RFC822规定,每76个字符,还需要加上一个回车换行。可以估算编码后数据长度大约为原长的135.1%。
转换的时候,将三个byte的数据,先后放入一个24bit的缓冲区中,先来的byte占高位。数据不足3byte的话,於緩衝區中剩下的Bit用0补足。然后,每次取出6个bit,按照其值选择ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/中的字符作为编码后的输出。不断进行,直到全部输入数据转换完成。
如果最後剩下兩個輸入數據,在編碼結果後加1個「=」;如果最後剩下一個輸入數據,編碼結果後加2個「=」;如果沒有剩下任何數據,就什麼都不要加,這樣才可以保證資料還原的正確性。
舉例來說,一段引用自Thomas Hobbes’s Leviathan的文句:
| Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure. |
經過base64編碼之後變成:
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
一个例子
「M」的ASCII碼 = 77 = 01001101
「a」的 = 97 = 01100001
「n」的 = 110 = 01101110
將這三個字節拼合,得出一個24位的資料:
010011010110000101101110
現在六個一組的分開,這樣便得到六個數。將這些數轉為:
010011 = 19 = T (T是第19個英文字母)
010110 = 22 = W (W是第22個英文字母)
000101 = 5 = F
101110 = 46 = u (U是第20個英文字母)
base64編碼是:
00010011 00010110 00000101 00101110
即是每3個未編碼字節,編碼後會得到4個字節。
- 加密M:M=01001101,變成加密010011010000,六個一組分開是010011 010000,結果是TQ,然後在後面加兩個「=」,結果就是「TQ==」。
UTF-7
UTF-7 是一个修改的Base64(Modified Base64)。主要是将UTF-16的数据,用Base64的方法编码为可打印的 ASCII 字符序列。目的是传输 Unicode 数据。主要的区别在于不用等号”=”补余,因为该字符通常需要大量的转译。
标准可见RFC 2152, 《A Mail-Safe Transformation Format of Unicode》。
IRCu
在IRCu等 软件所使用的P10 IRC服务器间协议中,对客户与服务器的消息类型号(client/server numerics)和二进制IP地址采用了base64编码。消息类型号的长度固定为3字节,故可直接编码为4个字节而不需要加填充。对IP地址进行编码 时,则需要在地址前添加一些0比特,使之可以编码为整数个字节。这里所用的符号集与前述MIME的也有所不同,将+/改成了[]。
在URL中的应用
Base64编码可用于在HTTP环境下传递较长的标识信息。例如,在Java Persistence系统Hibernate中,就采用了Base64来将一个较长的唯一标识符(一般为128-bit的UUID)编码为一个字符串,用作HTTP表单和HTTP GET URL中的参数。在其他应用程序中,也常常需要把二进制数据编码为适合放在URL(包括隐藏表单域)中的形式。此时,采用Base64编码不仅比较简短,同时也具有不可读性,即所编码的数据不会被人用肉眼所直接看到。
然而,标准的Base64并不适合直接放在URL里传输,因为URL编码器会把标准Base64中的「/」和「+」字符变为形如「%XX」的形式,而这些「%」号在存入数据库时还需要再进行转换,因为ANSI SQL中已将「%」号用作通配符。
为解决此问题,可采用一种用于URL的改进Base64编码,它不在末尾填充’='号,并将标准Base64中的「+」和「/」分别改成了「*」和「-」,这样就免去了在URL编解码和数据库存储时所要作的转换,避免了编码信息长度在此过程中的增加,并统一了数据库、表单等处对象标识符的格式。
另有一种用于正则表达式的改进Base64变种,它将「+」和「/」改成了「!」和「-」,因为「+」,「*」以及前面在IRCu中用到的「[」和「]」在正则表达式中都可能具有特殊含义。
此外还有一些变种,它们将「+/」改为「_-」或「._」(用作编程语言中的标识符名称)或「.-」(用于XML中的Nmtoken)甚至「_:」(用于XML中的Name)。
其他应用
参见
外部链接
/*********************************************************************\
MODULE NAME: b64.c
AUTHOR: Bob Trower 08/04/01
PROJECT: Crypt Data Packaging
COPYRIGHT: Copyright (c) Trantor Standard Systems Inc., 2001
NOTE: This source code may be used as you wish, subject to
the MIT license. See the LICENCE section below.
DESCRIPTION:
This little utility implements the Base64
Content-Transfer-Encoding standard described in
RFC1113 (http://www.faqs.org/rfcs/rfc1113.html).
This is the coding scheme used by MIME to allow
binary data to be transferred by SMTP mail.
Groups of 3 bytes from a binary stream are coded as
groups of 4 bytes in a text stream.
The input stream is ‘padded’ with zeros to create
an input that is an even multiple of 3.
A special character (‘=’) is used to denote padding so
that the stream can be decoded back to its exact size.
Encoded output is formatted in lines which should
be a maximum of 72 characters to conform to the
specification. This program defaults to 72 characters,
but will allow more or less through the use of a
switch. The program enforces a minimum line size
of 4 characters.
Example encoding:
The stream ‘ABCD’ is 32 bits long. It is mapped as
follows:
ABCD
A (65) B (66) C (67) D (68) (None) (None)
01000001 01000010 01000011 01000100
16 (Q) 20 (U) 9 (J) 3 (D) 17 (R) 0 (A) NA (=) NA (=)
010000 010100 001001 000011 010001 000000 000000 000000
QUJDRA==
Decoding is the process in reverse. A ‘decode’ lookup
table has been created to avoid string scans.
DESIGN GOALS: Specifically:
Code is a stand-alone utility to perform base64
encoding/decoding. It should be genuinely useful
when the need arises and it meets a need that is
likely to occur for some users.
Code acts as sample code to show the author’s
design and coding style.
Generally:
This program is designed to survive:
Everything you need is in a single source file.
It compiles cleanly using a vanilla ANSI C compiler.
It does its job correctly with a minimum of fuss.
The code is not overly clever, not overly simplistic
and not overly verbose.
Access is ‘cut and paste’ from a web page.
Terms of use are reasonable.
VALIDATION: Non-trivial code is never without errors. This
file likely has some problems, since it has only
been tested by the author. It is expected with most
source code that there is a period of ‘burn-in’ when
problems are identified and corrected. That being
said, it is possible to have ‘reasonably correct’
code by following a regime of unit test that covers
the most likely cases and regression testing prior
to release. This has been done with this code and
it has a good probability of performing as expected.
Unit Test Cases:
case 0:empty file:
CASE0.DAT -> ->
(Zero length target file created
on both encode and decode.)
case 1:One input character:
CASE1.DAT A -> QQ== -> A
case 2:Two input characters:
CASE2.DAT AB -> QUJD -> AB
case 3:Three input characters:
CASE3.DAT ABC -> QUJD -> ABC
case 4:Four input characters:
case4.dat ABCD -> QUJDRA== -> ABCD
case 5:All chars from 0 to ff, linesize set to 50:
AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8gISIj
JCUmJygpKissLS4vMDEyMzQ1Njc4OTo7PD0+P0BBQkNERUZH
SElKS0xNTk9QUVJTVFVWV1hZWltcXV5fYGFiY2RlZmdoaWpr
bG1ub3BxcnN0dXZ3eHl6e3x9fn+AgYKDhIWGh4iJiouMjY6P
kJGSk5SVlpeYmZqbnJ2en6ChoqOkpaanqKmqq6ytrq+wsbKz
tLW2t7i5uru8vb6/wMHCw8TFxsfIycrLzM3Oz9DR0tPU1dbX
2Nna29zd3t/g4eLj5OXm5+jp6uvs7e7v8PHy8/T19vf4+fr7
/P3+/w==
case 6:Mime Block from e-mail:
(Data same as test case 5)
case 7: Large files:
Tested 28 MB file in/out.
case 8: Random Binary Integrity:
This binary program (b64.exe) was encoded to base64,
back to binary and then executed.
case 9 Stress:
All files in a working directory encoded/decoded
and compared with file comparison utility to
ensure that multiple runs do not cause problems
such as exhausting file handles, tmp storage, etc.
————-
Syntax, operation and failure:
All options/switches tested. Performs as
expected.
case 10:
No Args — Shows Usage Screen
Return Code 1 (Invalid Syntax)
case 11:
One Arg (invalid) — Shows Usage Screen
Return Code 1 (Invalid Syntax)
case 12:
One Arg Help (-?) — Shows detailed Usage Screen.
Return Code 0 (Success — help request is valid).
case 13:
One Arg Help (-h) — Shows detailed Usage Screen.
Return Code 0 (Success — help request is valid).
case 14:
One Arg (valid) — Uses stdin/stdout (filter)
Return Code 0 (Sucess)
case 15:
Two Args (invalid file) — shows system error.
Return Code 2 (File Error)
case 16:
Encode non-existent file — shows system error.
Return Code 2 (File Error)
case 17:
Out of disk space — shows system error.
Return Code 3 (File I/O Error)
————-
Compile/Regression test:
gcc compiled binary under Cygwin
Microsoft Visual Studio under Windows 2000
Microsoft Version 6.0 C under Windows 2000
DEPENDENCIES: None
LICENCE: Copyright (c) 2001 Bob Trower, Trantor Standard Systems Inc.
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated
documentation files (the “Software”), to deal in the
Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute,
sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall
be included in all copies or substantial portions of the
Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS
OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
VERSION HISTORY:
Bob Trower 08/04/01 — Create Version 0.00.00B
\******************************************************************* */
#include
#include
/*
** Translation Table as described in RFC1113
*/
static const char cb64[]=”ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/”;
/*
** Translation Table to decode (created by author)
*/
static const char cd64[]=”|$$$}rstuvwxyz{$$$$$$$>?@ABCDEFGHIJKLMNOPQRSTUVW$$$$$$XYZ[\\]^_`abcdefghijklmnopq”;
/*
** encodeblock
**
** encode 3 8-bit binary bytes as 4 ’6-bit’ characters
*/
void encodeblock( unsigned char in[3], unsigned char out[4], int len )
{
out[0] = cb64[ in[0] >> 2 ];
out[1] = cb64[ ((in[0] & 0×03) << 4) | ((in[1] & 0xf0) >> 4) ];
out[2] = (unsigned char) (len > 1 ? cb64[ ((in[1] & 0x0f) << 2) | ((in[2] & 0xc0) >> 6) ] : ‘=’);
out[3] = (unsigned char) (len > 2 ? cb64[ in[2] & 0x3f ] : ‘=’);
}
/*
** encode
**
** base64 encode a stream adding padding and line breaks as per spec.
*/
void encode( FILE *infile, FILE *outfile, int linesize )
{
unsigned char in[3], out[4];
int i, len, blocksout = 0;
while( !feof( infile ) ) {
len = 0;
for( i = 0; i < 3; i++ ) {
in[i] = (unsigned char) getc( infile );
if( !feof( infile ) ) {
len++;
}
else {
in[i] = 0;
}
}
if( len ) {
encodeblock( in, out, len );
for( i = 0; i < 4; i++ ) {
putc( out[i], outfile );
}
blocksout++;
}
if( blocksout >= (linesize/4) || feof( infile ) ) {
if( blocksout ) {
fprintf( outfile, “\r\n” );
}
blocksout = 0;
}
}
}
/*
** decodeblock
**
** decode 4 ’6-bit’ characters into 3 8-bit binary bytes
*/
void decodeblock( unsigned char in[4], unsigned char out[3] )
{
out[ 0 ] = (unsigned char ) (in[0] << 2 | in[1] >> 4);
out[ 1 ] = (unsigned char ) (in[1] << 4 | in[2] >> 2);
out[ 2 ] = (unsigned char ) (((in[2] << 6) & 0xc0) | in[3]);
}
/*
** decode
**
** decode a base64 encoded stream discarding padding, line breaks and noise
*/
void decode( FILE *infile, FILE *outfile )
{
unsigned char in[4], out[3], v;
int i, len;
while( !feof( infile ) ) {
for( len = 0, i = 0; i < 4 && !feof( infile ); i++ ) {
v = 0;
while( !feof( infile ) && v == 0 ) {
v = (unsigned char) getc( infile );
v = (unsigned char) ((v < 43 || v > 122) ? 0 : cd64[ v - 43 ]);
if( v ) {
v = (unsigned char) ((v == ‘$’) ? 0 : v – 61);
}
}
if( !feof( infile ) ) {
len++;
if( v ) {
in[ i ] = (unsigned char) (v – 1);
}
}
else {
in[i] = 0;
}
}
if( len ) {
decodeblock( in, out );
for( i = 0; i < len - 1; i++ ) {
putc( out[i], outfile );
}
}
}
}
/*
** returnable errors
**
** Error codes returned to the operating system.
**
*/
#define B64_SYNTAX_ERROR 1
#define B64_FILE_ERROR 2
#define B64_FILE_IO_ERROR 3
#define B64_ERROR_OUT_CLOSE 4
#define B64_LINE_SIZE_TO_MIN 5
/*
** b64_message
**
** Gather text messages in one place.
**
*/
char *b64_message( int errcode )
{
#define B64_MAX_MESSAGES 6
char *msgs[ B64_MAX_MESSAGES ] = {
"b64:000:Invalid Message Code.",
"b64:001:Syntax Error -- check help for usage.",
"b64:002:File Error Opening/Creating Files.",
"b64:003:File I/O Error -- Note: output file not removed.",
"b64:004:Error on output file close.",
"b64:004:linesize set to minimum."
};
char *msg = msgs[ 0 ];
if( errcode > 0 && errcode < B64_MAX_MESSAGES ) {
msg = msgs[ errcode ];
}
return( msg );
}
/*
** b64
**
** 'engine' that opens streams and calls encode/decode
*/
int b64( int opt, char *infilename, char *outfilename, int linesize )
{
FILE *infile;
int retcode = B64_FILE_ERROR;
if( !infilename ) {
infile = stdin;
}
else {
infile = fopen( infilename, "rb" );
}
if( !infile ) {
perror( infilename );
}
else {
FILE *outfile;
if( !outfilename ) {
outfile = stdout;
}
else {
outfile = fopen( outfilename, "wb" );
}
if( !outfile ) {
perror( outfilename );
}
else {
if( opt == 'e' ) {
encode( infile, outfile, linesize );
}
else {
decode( infile, outfile );
}
if (ferror( infile ) || ferror( outfile )) {
retcode = B64_FILE_IO_ERROR;
}
else {
retcode = 0;
}
if( outfile != stdout ) {
if( fclose( outfile ) != 0 ) {
perror( b64_message( B64_ERROR_OUT_CLOSE ) );
retcode = B64_FILE_IO_ERROR;
}
}
}
if( infile != stdin ) {
fclose( infile );
}
}
return( retcode );
}
/*
** showuse
**
** display usage information, help, version info
*/
void showuse( int morehelp )
{
{
printf( "\n" );
printf( " b64 (Base64 Encode/Decode) Bob Trower 08/03/01 \n" );
printf( " (C) Copr Bob Trower 1986-01. Version 0.00B \n" );
printf( " Usage: b64 -option [ -l num ] [ []] \n” );
printf( ” Purpose: This program is a simple utility that implements\n” );
printf( ” Base64 Content-Transfer-Encoding (RFC1113). \n” );
}
if( !morehelp ) {
printf( ” Use -h option for additional help. \n” );
}
else {
printf( ” Options: -e encode to Base64 -h This help text. \n” );
printf( ” -d decode from Base64 -? This help text. \n” );
printf( ” Note: -l use to change line size (from 72 characters)\n” );
printf( ” Returns: 0 = success. Non-zero is an error code. \n” );
printf( ” ErrCode: 1 = Bad Syntax, 2 = File Open, 3 = File I/O \n” );
printf( ” Example: b64 -e binfile b64file <- Encode to b64 \n" );
printf( " b64 -d b64file binfile <- Decode from b64 \n" );
printf( " b64 -e -l40 infile outfile <- Line Length of 40 \n" );
printf( " Note: Will act as a filter, but this should only be \n" );
printf( " used on text files due to translations made by \n" );
printf( " operating systems. \n" );
printf( " Release: 0.00.00, Tue Aug 7 2:00:00 2001, ANSI-SOURCE C\n" );
}
}
#define B64_DEF_LINE_SIZE 72
#define B64_MIN_LINE_SIZE 4
#define THIS_OPT(ac, av) (ac > 1 ? av[1][0] == ‘-’ ? av[1][1] : 0 : 0)
/*
** main
**
** parse and validate arguments and call b64 engine or help
*/
int main( int argc, char **argv )
{
int opt = 0;
int retcode = 0;
int linesize = B64_DEF_LINE_SIZE;
char *infilename = NULL, *outfilename = NULL;
while( THIS_OPT( argc, argv ) ) {
switch( THIS_OPT(argc, argv) ) {
case ‘l’:
linesize = atoi( &(argv[1][2]) );
if( linesize < B64_MIN_LINE_SIZE ) {
linesize = B64_MIN_LINE_SIZE;
printf( "%s\n", b64_message( B64_LINE_SIZE_TO_MIN ) );
}
break;
case '?':
case 'h':
opt = 'h';
break;
case 'e':
case 'd':
opt = THIS_OPT(argc, argv);
break;
default:
opt = 0;
break;
}
argv++;
argc--;
}
switch( opt ) {
case 'e':
case 'd':
infilename = argc > 1 ? argv[1] : NULL;
outfilename = argc > 2 ? argv[2] : NULL;
retcode = b64( opt, infilename, outfilename, linesize );
break;
case 0:
retcode = B64_SYNTAX_ERROR;
case ‘h’:
showuse( opt );
break;
}
if( retcode ) {
printf( “%s\n”, b64_message( retcode ) );
}
return( retcode );
}
近期评论