Hello in binary. Binary code

Computers don't understand words and numbers the way people do. Modern software allows the end user to ignore this, but at the lowest levels your computer operates on a binary electrical signal that has only two states: whether there is current or not. To "understand" complex data, your computer must encode it in binary format.

The binary system is based on two digits, 1 and 0, corresponding to on and off states that your computer can understand. You are probably familiar with the decimal system. It uses ten digits, from 0 to 9, and then moves on to the next order to form two-digit numbers, with each number being ten times larger than the previous one. The binary system is similar, with each digit being twice as large as the previous one.

Counting in binary format

In binary expression, the first digit is equivalent to 1 in the decimal system. The second digit is 2, the third is 4, the fourth is 8, and so on - doubling each time. Adding all these values will give you the number in decimal format.

1111 (in binary) = 8 + 4 + 2 + 1 = 15 (in decimal)

Accounting for 0 gives us 16 possible values for four binary bits. Move 8 bits and you get 256 possible values. This takes up a lot more space to represent since four decimal digits gives us 10,000 possible values. Of course, binary code takes up more space, but computers understand binary files much better than the decimal system. And for some things, like logic processing, binary is better than decimal.

It should be said that there is another basic system that is used in programming: hexadecimal. Although computers do not operate in hexadecimal format, programmers use it to represent binary addresses in a human-readable format when writing code. This is because two digits in a hexadecimal number can represent a whole byte, meaning they replace eight digits in binary. The hexadecimal system uses the numbers 0-9, as well as the letters A through F, to create an additional six digits.

Why do computers use binary files?

Short answer: hardware and laws of physics. Every character in your computer is an electrical signal, and in the early days of computing, measuring electrical signals was much more difficult. It made more sense to distinguish only the "on" state, represented by a negative charge, and the "off" state, represented by a positive charge.

For those who don't know why "off" is represented by a positive charge, it is because electrons have a negative charge, and more electrons mean more current with a negative charge.

Thus, early room-sized computers used binary files to create their systems, and although they used older, bulkier equipment, they worked on the same fundamental principles. Modern computers use what is called transistor to perform calculations with binary code.

Here is a diagram of a typical transistor:

Essentially, it allows current to flow from the source to the drain if there is current in the gate. This forms a binary key. Manufacturers can make these transistors incredibly small—down to 5 nanometers, or the size of two strands of DNA. This is how modern processors work, and even they can suffer from problems distinguishing between on and off states (though this is due to their unrealistic molecular size being subject to the weirdness of quantum mechanics).

Why only binary system

So you might be thinking, “Why only 0 and 1? Why not add another number? Although this is partly due to the traditions of creating computers, at the same time, adding another digit would mean the need to distinguish another state of the current, not just “off” or “on”.

The problem here is that if you want to use multiple voltage levels, you need a way to easily perform calculations on them, and current hardware capable of this is not viable as a replacement for binary calculations. For example, there is a so-called triple computer, developed in the 1950s, but development stopped there. Ternary logic more efficient than binary, but there is not yet an effective replacement for the binary transistor, or at least no transistor on the same tiny scale as binary.

The reason we can't use ternary logic comes down to how transistors are connected in a computer and how they are used for mathematical calculations. The transistor receives information at two inputs, performs an operation, and returns the result to one output.

Thus, binary mathematics is easier for a computer than anything else. Binary logic is easily converted to binary systems, with True and False corresponding to On and Off states.

A binary truth table running on binary logic will have four possible outputs for each fundamental operation. But, since triple gates use three inputs, the triple truth table would have 9 or more. While the binary system has 16 possible operators (2^2^2), the ternary system would have 19683 (3^3^3). Scaling becomes an issue because while trinity is more efficient, it is also exponentially more complex.

Who knows? In the future, we may well see ternary computers as binary logic faces miniaturization challenges. For now, the world will continue to operate in binary mode.

This lesson will cover the topic “Encoding information. Binary coding. Units of measurement of information." During it, users will be able to gain an understanding of information coding, how computers perceive information, units of measurement and binary coding.

Subject:Information around us

Lesson: Information coding. Binary coding. Units of information

This lesson will cover the following questions:

1. Coding as changing the form of information presentation.

2. How does a computer recognize information?

3. How to measure information?

4. Units of measurement of information.

In the world of codes

Why do people encode information?

1. Hide it from others (mirror cryptography of Leonardo da Vinci, military encryption).

2. Write down the information in short (shorthand, abbreviation, road signs).

3. For easier processing and transmission (Morse code, translation into electrical signals - machine codes).

Coding is the representation of information using some code.

Code is a system of symbols for presenting information.

Methods of encoding information

1. Graphic (see Fig. 1) (using drawings and signs).

Rice. 1. Signal flag system (Source)

2. Numerical (using numbers).

For example: 11001111 11100101.

3. Symbolic (using alphabet symbols).

For example: NKMBM CHGYOU.

Decoding is an action to restore the original form of information presentation. To decode, you need to know the code and encoding rules.

The means of encoding and decoding is the code correspondence table. For example, the correspondence in various number systems is 24 - XXIV, the correspondence of the alphabet with any symbols (Fig. 2).

Rice. 2. Cipher example (Source)

Examples of information encoding

An example of information coding is Morse code (see Figure 3).

Rice. 3. Morse code ()

Morse code uses only 2 symbols - a dot and a dash (short and long sound).

Another example of information encoding is the flag alphabet (see Fig. 4).

Rice. 4. Flag alphabet ()

Another example is the alphabet of flags (see Fig. 5).

Rice. 5. ABC of flags ()

A well-known example of coding is the musical alphabet (see Fig. 6).

Rice. 6. Musical alphabet ()

Consider the following problem:

Using the flag alphabet table (see Fig. 7), it is necessary to solve the following problem:

Rice. 7

Senior mate Lom passes the exam to Captain Vrungel. Help him read the following text (see Figure 8):

There are mainly two signals around us, for example:

Traffic light: red - green;

Question: yes - no;

Lamp: on - off;

It is possible - it is not possible;

Good bad;

Truth is a lie;

Back and forth;

Yes - no;

All these are signals indicating the amount of information in 1 bit.

1 bit - this is the amount of information that allows us to choose one option out of two possible ones.

Computer is an electrical machine that operates on electronic circuits. In order for the computer to recognize and understand the input information, it must be translated into computer (machine) language.

The algorithm intended for the performer must be written, that is, coded, in a language understandable to the computer.

These are electrical signals: current is passing or current is not passing.

Machine binary language - a sequence of "0" and "1". Each binary number can have the value 0 or 1.

Each digit of a machine binary code carries an amount of information equal to 1 bit.

The binary number that represents the smallest unit of information is called b it . A bit can take the value either 0 or 1. The presence of a magnetic or electronic signal in a computer means 1, the absence of 0.

A string of 8 bits is called b IT . The computer processes this string as a separate character (number, letter).

Let's look at an example. The word ALICE consists of 5 letters, each of which is represented in computer language by one byte (see Fig. 10). Therefore, Alice can be measured as 5 bytes.

Rice. 10. Binary Code (Source)

In addition to bits and bytes, there are other units of information.

Bibliography

1. Bosova L.L. Computer Science and ICT: Textbook for 5th grade. - M.: BINOM. Knowledge Laboratory, 2012.

2. Bosova L.L. Computer Science: Workbook for 5th grade. - M.: BINOM. Knowledge Laboratory, 2010.

3. Bosova L.L., Bosova A.Yu. Computer science lessons in grades 5-6: Methodological manual. - M.: BINOM. Knowledge Laboratory, 2010.

2. Festival "Open Lesson" ().

Homework

1. §1.6, 1.7 (Bosova L.L. Informatics and ICT: Textbook for grade 5).

2. Page 28, tasks 1, 4; p. 30, tasks 1, 4, 5, 6 (Bosova L.L. Informatics and ICT: Textbook for grade 5).

The meaning of the term “binary” is that it consists of two parts or components. Thus, binary codes are codes that consist of only two symbolic states, such as black or white, light or dark, conductor or insulator. A binary code in digital technology is a way of representing data (numbers, words, and others) as a combination of two characters, which can be designated as 0 and 1. The characters or units of BC are called bits. One of the justifications for the use of BC is the simplicity and reliability of storing information in any medium in the form of a combination of just two of its physical states, for example, in the form of a change or constancy of the light flux when reading from an optical code disk.
There are various possibilities for encoding information.

Binary code

In digital technology, a method of representing data (numbers, words, and others) as a combination of two characters, which can be designated as 0 and 1. The signs or units of the DC are called bits.

One of the justifications for the use of DC is the simplicity and reliability of storing information in any medium in the form of a combination of just two of its physical states, for example, in the form of a change or constancy of the magnetic flux in a given cell of the magnetic recording medium.

The largest number that can be expressed in binary depends on the number of digits used, i.e. on the number of bits in the combination expressing the number. For example, to express the numeric values from 0 to 7, it is enough to have a 3-digit or 3-bit code:

numeric value	binary code
0	000
1	001
2	010
3	011
4	100
5	101
6	110
7	111

From this we can see that for a number greater than 7 with a 3-digit code there are no longer code combinations of 0 and 1.

Moving from numbers to physical quantities, let us formulate the above statement in a more general form: the largest number of values m of any quantity (temperature, voltage, current, etc.), which can be expressed in binary code, depends on the number of bits used n as m= 2n. If n=3, as in the example considered, then we get 8 values, including leading 0.
Binary code is a multi-step code. This means that when moving from one position (value) to another, several bits can change simultaneously. For example, the number 3 in binary code = 011. The number 4 in binary code = 100. Accordingly, when moving from 3 to 4, all 3 bits change their state to the opposite simultaneously. Reading such a code from a code disk would lead to the fact that, due to inevitable deviations (tolerances) during the production of a code disk, a change in information from each of the tracks separately will never occur simultaneously. This in turn would lead to the fact that when moving from one number to another, incorrect information would be briefly given. So, during the above-mentioned transition from the number 3 to the number 4, a short-term output of the number 7 is very likely when, for example, the most significant bit during the transition changed its value a little earlier than the rest. To avoid this, a so-called one-step code is used, for example the so-called Gray Code.

Gray code

Gray code is a so-called one-step code, i.e. When moving from one number to another, only one of all bits of information always changes. An error when reading information from a mechanical code disk when moving from one number to another will only lead to the fact that the transition from one position to another will be only slightly shifted in time, but the issuance of a completely incorrect angular position value when moving from one position to another is completely eliminated .
Another advantage of Gray Code is its ability to mirror information. So, by inverting the most significant bit, you can simply change the direction of counting and thus match the actual (physical) direction of rotation of the axis. Changing the counting direction in this way can be easily changed by controlling the so-called “Complement” input. The output value can thus be increasing or decreasing for the same physical direction of rotation of the axis.
Since the information expressed in Gray Code is purely encoded in nature and does not carry real numerical information, it must first be converted into a standard binary code before further processing. This is done using a code converter (Gray-Binar decoder), which, fortunately, is easily implemented using a circuit of exclusive-or (XOR) logic elements, both in software and in hardware.

Corresponding decimal numbers in the range from 0 to 15 to binary and Gray codes

Binary coding			Gray coding
Decimal code	Binary value	Sixteen meaning	Decimal code	Binary value	Sixteen meaning
0	0000	0h	0	0000	0h
1	0001	1h	1	0001	1h
2	0010	2h	3	0011	3h
3	0011	3h	2	0010	2h
4	0100	4h	6	0110	6h
5	0101	5h	7	0111	7h
6	0110	6h	5	0101	5h
7	0111	7h	4	0100	4h
8	1000	8h	12	1100	Ch
9	1001	9h	13	1101	Dh
10	1010	Ah	15	1111	Fh
11	1011	Bh	14	1110	Eh
12	1100	Ch	10	1010	Ah
13	1101	Dh	11	1011	Bh
14	1110	Eh	9	1001	9h
15	1111	Fh	8	1000	8h

Converting the Gray code to the usual binary code can be done using a simple circuit with inverters and exclusive-or gates as shown below:

Code Gray-Excess

The usual one-step Gray code is suitable for resolutions that can be represented as a number raised to the power of 2. In cases where it is necessary to implement other permissions, the middle section is cut out from the regular Gray code and used. This way the code remains “one-step”. However, the numeric range does not start at zero, but is shifted by a certain value. When processing information, half the difference between the original and reduced resolution is subtracted from the generated signal. Resolutions such as 360? to express an angle are often implemented by this method. So a 9-bit Gray code equal to 512 steps, trimmed on both sides by 76 steps, will be equal to 360°.

Binary code decoding is used to translate from machine language to regular language. Online tools work quickly, although it is not difficult to do it manually.

Binary or binary code is used to transmit information digitally. A set of just two characters, such as 1 and 0, allows you to encrypt any information, be it text, numbers or an image.

How to encrypt with binary code

To manually convert any symbols into binary code, tables are used in which each symbol is assigned a binary code in the form of zeros and ones. The most common encoding system is ASCII, which uses 8-bit code notation.

The basic table shows binary codes for the Latin alphabet, numbers and some symbols.

A binary interpretation of the Cyrillic alphabet and additional characters has been added to the extended table.

To convert from binary code to text or numbers, simply select the desired codes from the tables. But, of course, doing this kind of work manually takes a long time. And mistakes, moreover, are inevitable. The computer copes with decryption much faster. And we don’t even think, while typing text on the screen, that at that moment the text is being converted into binary code.

Converting a binary number to decimal

To manually convert a number from a binary number system to a decimal number system, you can use a fairly simple algorithm:

Below the binary number, starting with the rightmost digit, write the number 2 in increasing powers.
The powers of 2 are multiplied by the corresponding digit of the binary number (1 or 0).
Add the resulting values.

This is what this algorithm looks like on paper:

Online services for binary decryption

If you still need to see the decrypted binary code, or, conversely, convert the text into binary form, the easiest way is to use online services designed for these purposes.

Two windows, familiar to online translations, allow you to almost simultaneously see both versions of the text in regular and binary form. And decryption is carried out in both directions. Entering text is a simple matter of copying and pasting.

Bit depth of binary code, Conversion of information from continuous to discrete form, Universality of binary coding, Uniform and non-uniform codes, Computer Science 7th grade Bosova, Computer Science 7th grade

1.5.1. Converting information from continuous to discrete form
To solve his problems, a person often has to transform existing information from one form of representation to another. For example, when reading aloud, information is converted from discrete (text) form to continuous (sound). During a dictation in a Russian language lesson, on the contrary, information is transformed from a continuous form (the teacher’s voice) into a discrete one (students’ notes).
Information presented in discrete form is much easier to transmit, store or automatically process. Therefore, in computer technology, much attention is paid to methods for converting information from continuous to discrete form.
Discretization of information is the process of converting information from a continuous form of representation to a discrete one.
Let's look at the essence of the information sampling process using an example.
Meteorological stations have recorders for continuous recording of atmospheric pressure. The result of their work is barograms - curves showing how pressure has changed over long periods of time. One of these curves, drawn by the device during seven hours of observation, is shown in Fig. 1.9.

Based on the information received, you can build a table containing the instrument readings at the beginning of measurements and at the end of each hour of observation (Fig. 1.10).

The resulting table does not give a completely complete picture of how the pressure changed during the observation period: for example, the highest pressure value that occurred during the fourth hour of observation is not indicated. But if you tabulate the pressure values observed every half hour or 15 minutes, the new table will give a more complete picture of how the pressure changed.
Thus, we converted information presented in continuous form (barogram, curve) into discrete form (table) with some loss of accuracy.
In the future, you will become familiar with ways to discretely represent audio and graphic information.

Chains of three binary symbols are obtained by complementing two-digit binary codes on the right with the symbol 0 or 1. As a result, the code combinations of three binary symbols are 8 - twice as many as those of two binary symbols:
Accordingly, a four-bit binary allows you to get 16 code combinations, a five-bit one - 32, a six-bit one - 64, etc. The length of the binary chain - the number of characters in the binary code - is called the bit depth of the binary code.
Note that:
4 = 2 * 2,
8 = 2 * 2 * 2,
16 = 2 * 2 * 2 * 2,
32 = 2 * 2 * 2 * 2 * 2 etc.
Here, the number of code combinations is the product of a certain number of identical factors equal to the bit depth of the binary code.
If the number of code combinations is denoted by the letter N, and the bit depth of the binary code by the letter i, then the identified pattern in general form will be written as follows:
N = 2 * 2 * ... * 2.
i factors
In mathematics, such products are written as:
N = 2 i.
Entry 2 i is read as follows: “2 to the i-th power.”

Task. The leader of the Multi tribe instructed his minister to develop a binary and translate all important information into it. What size binary will be required if the alphabet used by the Multi tribe contains 16 characters? Write down all code combinations.
Solution. Since the Multi tribe alphabet consists of 16 characters, they need 16 code combinations. In this case, the length (bit depth) of the binary code is determined from the ratio: 16 = 2 i. Hence i = 4.
To write down all code combinations of four 0s and 1s, we use the diagram in Fig. 1.13: 0000, 0001, 0010, 0011, 0100, 0101, 0110,0111,1000,1001,1010,1011,1100,1101,1110,1111.

1.5.3. The versatility of binary coding
At the beginning of this section, you learned that, represented in continuous form, can be expressed using symbols in some natural or formal language. In turn, characters of an arbitrary alphabet can be converted to binary. Thus, using binary code, any natural and formal languages, as well as images and sounds, can be represented (Fig. 1.14). This means the universality of binary coding.
Binary codes are widely used in computer technology, requiring only two states of an electronic circuit - “on” (this corresponds to the number 1) and “off” (this corresponds to the number 0).
Simplicity of technical implementation is the main advantage of binary coding. The disadvantage of binary coding is the large length of the resulting code.

1.5.4. Uniform and non-uniform codes
There are uniform and non-uniform codes. Uniform codes in code combinations contain the same number of symbols, uneven ones contain a different number.
Above we looked at uniform binary codes.
An example of a non-uniform code is Morse code, in which a sequence of short and long signals is defined for each letter and number. So, the letter E corresponds to a short signal (“dot”), and the letter Ш corresponds to four long signals (four “dashes”). Uneven allows you to increase the speed of message transmission due to the fact that the most frequently occurring symbols in the transmitted information have the shortest code combinations.

The information that this symbol gives is equal to the entropy of the system and is maximum in the case when both states are equally probable; in this case, the elementary symbol conveys information 1 (two units). Therefore, the basis of optimal encoding will be the requirement that elementary characters in the encoded text occur on average equally often.

Let us present here a method for constructing a code that satisfies the stated condition; This method is known as the Shannon-Fano code. Its idea is that the encoded symbols (letters or combinations of letters) are divided into two approximately equally probable groups: for the first group of symbols, 0 is placed in the first place of the combination (the first character of the binary number representing the symbol); for the second group - 1. Next, each group is again divided into two approximately equally probable subgroups; for symbols of the first subgroup, zero is placed in second place; for the second subgroup - one, etc.

Let us demonstrate the principle of constructing the Shannon-Fano code using the material of the Russian alphabet (Table 18.8.1). Let's count the first six letters (from “-” to “t”); summing up their probabilities (frequencies), we get 0.498; all other letters (from “n” to “sf”) will have approximately the same probability of 0.502. The first six letters (from “-” to “t”) will have a binary 0 in the first place. The remaining letters (from “n” to “f”) will have a one in the first place. Next, we again divide the first group into two approximately equally probable subgroups: from “-” to “o” and from “e” to “t”; for all letters of the first subgroup in the second place we will put zero, and of the second subgroup - one. We will continue the process until exactly one letter remains in each division, which will be encoded with a certain binary number. The mechanism for constructing the code is shown in Table 18.8 .2, and the code itself is given in table 18.8.3.

Table 18.8.2.

	Binary signs

Table 18.8.3

Using Table 18.8.3, you can encode and decode any message.

As an example, let’s write the phrase “information theory” in binary code.

01110100001101000110110110000

0110100011111111100110100

1100001011111110101100110

Note that there is no need to separate the letters from each other with a special sign, since decoding is performed unambiguously even without this. You can verify this by decoding the following phrase using Table 18.8.2:

10011100110011001001111010000

1011100111001001101010000110101

010110000110110110

(“encoding method”).

However, it should be noted that any encoding error (random confusion of 0 and 1 characters) with such a code is disastrous, since decoding all text following the error becomes impossible. Therefore, this coding principle can be recommended only in cases where errors in encoding and transmitting a message are practically eliminated.

A natural question arises: is the code we have compiled, in the absence of errors, really optimal? In order to answer this question, let's find the average information per elementary symbol (0 or 1) and compare it with the maximum possible information, which is equal to one binary unit. To do this, we first find the average information contained in one letter of the transmitted text, i.e., entropy per letter:

where is the probability that the letter will take a certain state (“-”, o, e, a,..., f).

From the table 18.8.1 we have

(two units per letter of text).

Using table 18.8.2, we determine the average number of elementary symbols per letter

Dividing the entropy by, we obtain information per elementary symbol

(two units).

Thus, the information per character is very close to its upper limit of 1, and the code we have chosen is very close to the optimal one. Remaining within the confines of the task of encoding letters, we cannot achieve anything better.

Note that in the case of encoding simply binary numbers of letters, we would have an image of each letter with five binary characters and the information for one character would be

(two units),

i.e., noticeably less than with optimal letter coding.

However, it should be noted that coding “by letter” is not economical at all. The fact is that there is always a dependence between adjacent letters of any meaningful text. For example, after a vowel in the Russian language there cannot be “ъ” or “ь”; “I” or “yu” cannot appear after hissing ones; after several consonants in a row, the probability of a vowel increases, etc.

We know that when dependent systems are combined, the total entropy is less than the sum of the entropies of the individual systems; therefore, the information conveyed by a piece of connected text is always less than the information per character times the number of characters. Taking this circumstance into account, a more economical code can be constructed if you encode not each letter individually, but entire “blocks” of letters. For example, in a Russian text it makes sense to encode entirely some frequently occurring combinations of letters, such as “tsya”, “ayet”, “nie”, etc. The encoded blocks are arranged in descending order of frequency, like the letters in the table. 18.8.1, and binary coding is carried out according to the same principle.

In some cases, it turns out to be reasonable to encode not even blocks of letters, but entire meaningful pieces of text. For example, to relieve the telegraph during the holidays, it is advisable to encode entire standard texts with conventional numbers, such as:

“Congratulations on the New Year, I wish you good health and success in your work.”

Without dwelling specifically on block coding methods, we will limit ourselves to formulating Shannon’s theorem related here.

Let there be a source of information and a receiver connected by a communication channel (Fig. 18.8.1).

The productivity of the information source is known, i.e. the average number of binary information units coming from the source per unit of time (numerically it is equal to the average entropy of the message produced by the sources per unit of time). Let, in addition, the channel capacity be known, i.e. the maximum amount of information (for example, binary characters 0 or 1) that the channel is capable of transmitting in the same unit of time. The question arises: what should the channel capacity be for it to “cope” with its task, that is, for information to arrive from the source to the receiver without delay?

The answer to this question is given by Shannon's first theorem. Let us formulate it here without proof.

Shannon's 1st theorem

If the communication channel capacity is greater than the entropy of the information source per unit time

then it is always possible to encode a sufficiently long message so that it is transmitted by a communication channel without delay. If, on the contrary,

then the transfer of information without delay is impossible.

My secret

Hello in binary. Binary code

Counting in binary format

Why do computers use binary files?

Why only binary system

How to encrypt with binary code

Converting a binary number to decimal

Online services for binary decryption

We recommend

Detailed instructions for overclocking a processor How to find out the pll version

Review of the free version of Realtek HD Audio

The physical memory of the computer is loaded, what should I do?

How to find out the motherboard number through Windows without opening the case Visual identification of the model and manufacturer of the motherboard

Who invented the compass? DIV tag. What is a compass? Where and when did the first compass appear? Who invented the compass and when

Mukhosransk Is there a Mukhosransk

How to calculate the distance between GPS coordinates Formula for calculating the distance between two points

Distance from point to point: formulas, examples, solutions Online distance between two points

Modpack from Wotspeak for World of Tanks

Comparative reviews of Android apps Paid apps without in-app purchases

How to make calls from iPad to mobile devices - a complete guide

The iPhone screen stopped rotating

How to close all open tabs on iPhone

How to make beautiful musical cards without any problems?

What can be made from lapis lazuli in Minecraft: Minecraft recipes what to make from lapis lazuli

Advertising keeps popping up on your phone - how to remove it?

Decoding POST card codes

Setting up and installing a computer on a bicycle How to set up a bicycle computer

USB cable for phone: how to choose and what you need to know?

Lenovo Y Laser Gaming Mouse (GX30J07894) Key Specifications

Detailed instructions for overclocking a processor How to find out the pll version

Review of the free version of Realtek HD Audio

The physical memory of the computer is loaded, what should I do?

How to find out the motherboard number through Windows without opening the case Visual identification of the model and manufacturer of the motherboard

Who invented the compass? DIV tag. What is a compass? Where and when did the first compass appear? Who invented the compass and when