> hexdump -C text16B.txt 00000000 fe ff 00 54 00 68 00 69 00 73 00 20 00 69 00 73 |...T.h.i.s. .i.s| 00000010 00 20 00 73 00 6f 00 6d 00 65 00 20 00 74 00 65 |. .s.o.m.e. .t.e| 00000020 00 78 00 74 00 0a 00 2d 00 2d 00 e4 00 2d 00 2d |.x.t...-.-...-.-| 00000030 00 3a 00 20 00 20 00 61 00 20 00 77 00 69 00 74 |.:. . .a. .w.i.t| 00000040 00 68 00 20 00 75 00 6d 00 6c 00 61 00 75 00 74 |.h. .u.m.l.a.u.t| 00000050 00 0a 00 2d 00 2d 22 1a 00 2d 00 2d 00 3a 00 20 |...-.-"..-.-.:. | 00000060 00 20 00 73 00 71 00 72 00 74 |. .s.q.r.t| 0000006a > hexdump -C text16L.txt 00000000 ff fe 54 00 68 00 69 00 73 00 20 00 69 00 73 00 |..T.h.i.s. .i.s.| 00000010 20 00 73 00 6f 00 6d 00 65 00 20 00 74 00 65 00 | .s.o.m.e. .t.e.| 00000020 78 00 74 00 0a 00 2d 00 2d 00 e4 00 2d 00 2d 00 |x.t...-.-...-.-.| 00000030 3a 00 20 00 20 00 61 00 20 00 77 00 69 00 74 00 |:. . .a. .w.i.t.| 00000040 68 00 20 00 75 00 6d 00 6c 00 61 00 75 00 74 00 |h. .u.m.l.a.u.t.| 00000050 0a 00 2d 00 2d 00 1a 22 2d 00 2d 00 3a 00 20 00 |..-.-.."-.-.:. .| 00000060 20 00 73 00 71 00 72 00 74 00 | .s.q.r.t.| 0000006a |
The first two bytes are a byte order mark or BOM. You can see how that works.
Notice the change in our example code points from last time. The รค ('LATIN SMALL LETTER A WITH DIAERESIS'), which is c3 a4 in UTF-8, is 00 e4 in big-endian UTF-16 (and the reverse in little-endian)
The √ is e2 88 9a in UTF-8, and is 22 1a in big-endian UTF-16 and reversed as well in little-endian.
The most important point is that the bytes are completely different depending on the encoding (UTF-8 versus UTF-16), and a difference of encoding is usually the source of weird or unexpected stuff you see printed on the screen, when the encoding and decoding don't match.
I wish they had come up with some other words to describe the byte order than big- and little-endian. The reference to Gulliver is amusing, but a string of bytes has two ends. Which is supposed to be the big one? It's similar to the situation with genes, where people talk about the 5' and 3' ends, but of course each end of double-stranded DNA has both 5' and 3' ends!
Big-endian is described as the "most significant byte first." How about "highest value byte first" or "natural order" ?