Shannon's basic idea

Printer-friendly versionPDF version

What did Claude Shannon give us? Well, he created a new branch of maths known as mathematical communication theory.  Great.  No doubt that was very important to maths, but why does it matter to anyone else?  At its heart, Shannon gave us a very general language in which to describe precisely many very different things, and quantify them. This is the language of information. The connection with Turing’s work is immediate once you realise that the inputs and outputs of a Turing machine are information. Turing gave us the idea of a computer, which processes information.  The idea of information already existed. But Shannon gave us the language in which we can describe the bits required for a computer programme and the bandwidth of an internet connection, and all these things that are essential to make computers actually work. Shannon allowed us to quantify information – say how much information there is. As computers and informational technology like the internet have become so much a part of our daily lives, this language of information – actually there are now multiple languages – have become increasingly important to navigate the world successfully. Further, we will see in the next section how the idea of a general language of information, now that we have it, can also describe many more things. 

Shannon showed in 1948 how information could be transmitted efficiently across communication channels using coded messages. Shannon described an information-generating system as a combination of five essential components: an information source, a transmitter, a channel, a receiver and a destination (Shannon, 1948, p. 4; Wiener, 1948, p. 79). The information source produces a message to be communicated to the receiver. The transmitter operates on the message to produce a signal suitable for transmission over the channel, which is simply the medium of signal transmission. The receiver reconstructs the message from the signal. And the destination is the entity for which the message is intended. This idea might sound very dull to anyone who uses mobile phones every day, but without Shannon’s work you wouldn’t have a mobile phone! According to Shannon, communication amounts to the source of information producing a sequence of symbols, which is then reproduced by the receiver. The reproduction is only to some degree of accuracy – as you realise when you don’t hear clearly during a mobile phone conversation.

Taking this description of communication, Shannon attempted to solve the ‘fundamental problem of communication’ (Shannon, 1948, p. 1), finding the optimal way to reproduce, exactly or approximately, messages at their destination from some source of information. One vital novelty in Shannon’s work is easy to miss. Note that his information theory has abstracted away from the physical media of communication so that relevant physical constraints can be analysed separately. It doesn’t matter whether you are using a phone – mobile or landline, or what make or model of phone – or any of many different radios, or a Skype call, or sending a large file by email.  Shannon’s theory is about the information transmitted and tells you about that information, whatever the particular physical medium you are using to transmit it. Shannon provided a statistical definition of information as well as general theorems about the theoretical lower bounds of bit rates and the capacity of information flow – which tell you how fast you can transmit information.

Importantly, on Shannon’s analysis of information (there are now many other analyses), information does not involve any meaning. It doesn’t matter whether you are trying to send, “Meet you for lunch on Tuesday 2pm”, or, “8459264628399583478324724448283”. Shannon information concerns only correlations between messages, variables, etc. For example, it concerns whether the message received matches – correlates with – the message sent.  It is a quantitative measure of how much information is successfully conveyed. Shannon information is much more specific than our ordinary usage of “information”; Shannon information tells us nothing about whether a message is useful or interesting. The basic aim is coding messages (perhaps into binary codes like 000010100111010) at the bare minimum of bits we must send to get the message across. One of the simplest unitary-forms of Shannon information is the recording of a choice between two equally probable basic alternatives, such as “On” or “Off”. A sufficient condition for a physical system to be deemed a sender or receiver of information is the production of a sequence of symbols in a probabilistic manner (Wiener, 1948, p. 75).

Shannon’s mathematical theory is still used today in “information theory,” which is the branch of study that deals with quantitative measures of information. Two of Shannon’s metrics are still commonly used: one a measure of how much information can be stored in a symbol system and the other a measure of the uncertainty of a piece of information. The English anthropologist Gregory Bateson famously defined information as “a difference that makes a difference.” This definition aptly characterizes Shannon’s first metric. Binary is the code usually used by computers, representing everything using only two symbols, 0 and 1. One binary digit, or bit, can store two pieces of information, since it can represent two different states: 0 or 1. However, two bits can store four states: 00, 01, 10 and 11. Three bits can store eight states, four sixteen and so on. This can be generalized by the formula log 2 (x), where x represents the number of possible symbols in the system. Log 2 (8), for instance, equals 3, indicating that 3 binary bits of information are needed to encode 8 information states. As an example, these possibilities are essential to how much disc space is needed to store a computer program.

Shannon’s second metric is “entropy,” a term recommended to him by John von Neumann because of its relation to entropy in thermodynamic systems. Some say this use of the term is fortunate because it captures similar phenomena, but others say it is unfortunate, due to the fact that the two types of entropy are only related to a certain extent – they are only somewhat isomorphic. Simply put, entropy in thermodynamics measures disorder. But information entropy is a measure of the uncertainty in terms of unpredictability of a piece of information. Information that is highly probable, hence more predictable, has a lower entropy value than less distributed information and therefore tells us less about the world.  One example is that of a coin toss. On the one hand, the toss of a fair coin that may land heads or tails with equal probability has a less predictable outcome, higher entropy, and thus a greater ability to decrease our ignorance about a future state of affairs. On the other hand, a weighted coin that is very likely to fall “Heads” has a very predictable outcome, lower entropy, and therefore is unable to tell us anything we do not already know.

The significant aspect of Shannon information is that a produced message is selected from a set of possible messages. The more possible messages a recipient could have otherwise received, the more surprised the recipient is when it gets that particular message. Receiving a message changes the recipient’s circumstance from not knowing something to knowing what it is. The average amount of data deficit (uncertainty or surprise) of the recipient is also known as informational entropy (Floridi, 2010a, Chapter 3). The higher the probability of a message to be selected, the lower the amount of Shannon information associated with it is.1

The importance of uncertainty to Shannon information has an important implication. Although Shannon is not concerned with what messages mean to us, the amount of Shannon information conveyed is as much a property of our own knowledge as anything in the message. If we send the same message twice every time (a message and its copy), the information in the two messages is not the sum of that in each. The information only comes from the first message, while the second message is redundant. Still, for Shannon the semantic aspects of messages carrying meaning are ‘irrelevant to the engineering problem [of communication]’ (Shannon, 1948, p. 1). This, then, is how Shannon showed us what could be done with the concept of information.

  • 1. Shannon information is defined as the base two logarithm of the probability of selecting a message s:  Info(s) = Log2(Prob-1(s)).