I have a general conceptual question about endianness and how it affects tcp socket communication with C/C++. Here's an example:
You have two servers that are communicating with tcp sockets and one uses big endian and the other little endian. If you send an integer, over the socket, from one server to the other, I understand that the byte ordering is reversed and the integer will not print what is expected. Correct? I saw somewhere (I can't find where anymore) that if you send a char over the socket endianness doesn't change the value and it prints as expected. Is this correct? If so, why? I feel like I've done this before in the past, but I could be delusional.
Could anybody clear this up for me?
Edit: Is it because char is only 1 byte?
Think about the size of each data type.
An integer is typically four bytes, which you can think of as four individual bytes side by side. The endianness of an architecture determines whether the most significant byte is the first of the four bytes, or the last. A char, however, is only one byte. As I understand it, endianness does not affect the order of the bits in each byte (see the image on Wikipedia's page on Endianness).
A char, however, is only one byte, so there's no alternative order (assuming that I am correct that bits are not modified by endianness).
If you send a char over a socket, it will be one byte on both machines. If you send an int over a socket, since it's four bytes, it's possible that one machine will interpret the bytes in a different order than the other, according to the endianness. You should set up a simple way to test this and get back with some results!
The only thing you can send over a TCP socket is bytes. You cannot send an integer over a TCP socket without first creating some byte representation for that integer. The C/C++ type,
integer, can be stored in memory in whatever way the platform likes. If that just happens to be the form in which you need to send it over the TCP socket, then fine. But if it's not, then you have to convert into the form the protocol requires before you send it and into your native format after you receive it.
As a bit of a sloppy analogy, consider they way I communicate with you. My native language might be Spanish and who knows what goes on in my brain. Internally, I might represent the number three as "tres" or some weird pattern of neurons. Who knows? But when I communicate with you, I must represent the number three as "3" or "three" because that's the protocol you and I have agreed to, the English language. So unless I'm a terrible English speaker, how I internally store the number three won't affect my communication with you.
Since this group requires me to produce streams of English characters to talk to you, I must convert my internal number representations to streams of English characters. Unless I'm terrible at doing that, how I store numbers internally will not affect the streams of English characters I produce.
So unless you do foolish things, this will never matter. Since you will be sending and receiving bytes over the TCP socket, the memory format of the
integer type won't matter, since you won't be sending or receiving instances of the C/C++
integer type but logical integers.
For example, if the protocol specification for the data you are sending over TCP says that you need to send a four-byte integer in little-endian format, then you should write code to do that. If the code takes your platform's endianness into consideration, that would be purely as an optimization that should not affect code behavior.
Byte endianness refers to the order of individual bytes in a data type of more than 1 byte (such as short, int, long, etc.)
So your assumption is correct for
int (since it must be a least 16 bits, usually more nowadays). It is also often correct for
char since they are usually 1 byte. But you could have chars with more than 8bits, in which case endianness matters.