When person A sends information to person B. A is referencing a specific state in the context of possible states between A and B. If the states are ordered, then for A to send information to B, it suffices for A to send a number to B.
In the following examples can you think of how we can use numbers and context to relay the information we need?
Everything can be considered information, and information can be encoded as numbers. This applies not only to computing, but also to biology, where DNA serves as a code that encodes genetic information.
Unary numbers only have 1 digit. The simplest way of representing numbers:
We humans like decimal numbers. We have ten fingers. Counting is easy when we can link them to something physical. It's called base-10 because there are ten digits. For numbers bigger than 9 we re-use the digits we already have.
Hexadecimal numbers have 16 digits: 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f. If we need to reference a number bigger than 15, we also re-use the digits that we already have (just like base-10). For example, \x11 represents number 17 (in decimal). Note: We use \x in front of a number to denote that it's a hexadcimal number.
Binary numbers have only 2 digits: 0 and 1. If we need to reference a number bigger than 1, we also re-use the digits that we already have. For example, 11 represents number 3 (in decimal).
While there have been computers that use number systems other than binary, binary is the most widely used and dominant number system in modern computing due to its simplicity, scalability, and compatibility with the underlying electronics of computers.
Electronics: Binary is well-suited to the underlying electronics of computers, as it maps directly to the two voltage levels (high and low) used in electronic switches.
Memory and storage: Binary can be used to store and retrieve data in a compact and efficient manner, as each binary digit (or "bit") represents a single binary value (0 or 1).
When trying to convey info to people (or generally outputting it as a string) we need to find an efficient and understandable way of doing so. Binary is not a good way for us humans to understand things.
For example: assuming the least significant bit is on the right, what is the decimal number corresponding to 101111011 ?
print(2**0 + 2**1 + 2**3 + 2**4 + 2**5 + 2**6+ 2**8)
print(int('101111011', 2))
379 379
# we use the bin command to look at the binary representation of a decimal number
bin(379)
'0b101111011'
What about the hex representation of 01111011?
hex(int('100111011', 2))
'0x13b'
It's now easier to interpret the binary number "100111011". We split the big binary number into chunks, each chunk is of size 4 bits. We can now interpret each chunk at a time.
When we print random bytes in python. Python sometimes prints the hexadecimal representation of the byte.
import os
# generating 32 random byte
r = os.urandom(32)
print(r)
b'\xe8\x1f\x14\x10\x12\x16&U\xe9\xdf\xbf^\x1d\xcf\xb1J\xbf]1&?%\xc1)\x03\xf5\xc5\x90 \x11A\x9b'
Why bytes not bits?
Programming languages use bytes as a unit of measurement for memory because bytes are more convenient for measuring memory than bits. Bytes are multiples of 8 bits and can easily be used to store a character or number in a single unit, which makes it easier to manipulate data stored in memory. On the other hand, using bits would be more difficult and less efficient for memory storage and manipulation.
While hexadecimals maybe a good way for us to interpret binary encoding numbers it is not the best way to understand text!
ASCII encoding is 7 bits long.
Printable ASCII characters.
# let's look at the byte of the character z
print(b'z')
b'z'
from binascii import hexlify
print(hexlify(b'z'))
b'7a'
# let's print some invisible ascii characters
# i will pick the tab
print('\t hello')
hello
from binascii import hexlify
a = '\t'.encode('ascii')
print(hexlify(a))
print(a)
b'09' b'\t'
#Let's write hello in ascii bytes
# a = b'hello'
a = [104, 101, 108, 108, 111]
print(a)
print(bytes(a))
# let's add a new line
a.append(0x0A)
print(bytes(a))
print(bytes(a).decode('ascii'))
[104, 101, 108, 108, 111] b'hello' b'hello\n' hello
What happens when we try to print something outside the range of ASCII? ASCII is 7 bits but a byte is 8 bits.
print(bytes([104,255]))
b'h\xff'
Using ASCII each character costs exactly 7 bits to encode. Can we reduce it more? Can we have 6 bits per character?
The answer to that question is yes but at the expense of expressibility. Base64 provides a more efficient encoding method compared to ASCII as each Base64 character can represent 6 bits of data while ASCII can represent only 7 bits.
import base64
a = base64.b64decode('helloooo')
print(hexlify(a))
print(len(a))
a = b'helloooo'
print(hexlify(a))
print(a)
b'85e965a28a28' 6 b'68656c6c6f6f6f6f' b'helloooo'
"Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the World Wide Web where one of its uses is the ability to embed image files or other binary assets inside textual assets such as HTML and CSS files.
Base64 is also widely used for sending e-mail attachments. This is required because SMTP – in its original form – was designed to transport 7-bit ASCII characters only. This encoding causes an overhead of 33–37% (33% by the encoding itself; up to 4% more by the inserted line breaks)." Source: wikipedia
import os
import base64
r = os.urandom(32)
print(len(r))
try:
print(r.decode('ascii'))
except Exception as e:
print(e)
b64encoded = base64.b64encode(r)
print(b64encoded)
print(len(b64encoded))
print(base64.b64decode(b64encoded))
print(len(b64encoded)/len(r))
32 'ascii' codec can't decode byte 0xa8 in position 0: ordinal not in range(128) b'qE2A0WSQbAa6Fd+HCms53iSVP118gGMvJgoFBxTMYE0=' 44 b'\xa8M\x80\xd1d\x90l\x06\xba\x15\xdf\x87\nk9\xde$\x95?]|\x80c/&\n\x05\x07\x14\xcc`M' 1.375