¿ªÔÆÌåÓý

Re: Pulling Arduino data apart


Jack Purdum
 

The spec for the serial link should define if those integers go over as big or little endian.
...
So the serial link spec may need to fully define the format of the data stream, not just say whether
it is big or little endian.
Absolutely agree, which was what I was saying from the very start. Allard's code can be endian agnostic because it runs in a single known environment. I haven't looked at his code for some time, but I don't know if there is anyplace in the code where he needs to break apart a basic data type.

My comment about putting bits on the floor meant that you had to know something about the byte order, otherwise why are you interested only in the high byte. Your code:

? ? sendbyte((data32>>24)&0xff);

to send a byte works great if the data is big endian:

????????01010101 00000000 00000000 00000000. ??????? // Yellow is the byte of interest

However, if you don't know the byte order and it is:

????????00000000 00000000 00000000 01010101

Your code would throw the relevant data on the floor. Your code is only safe if you know the order. A union is a simple way to determine that order.

Jack, W8TEE






From: Jerry Gaffke via Groups.Io <jgaffke@...>
To: [email protected]
Sent: Thursday, March 8, 2018 2:42 PM
Subject: Re: [BITX20] Pulling Arduino data apart

Consider a serial link, perhaps a UART, we wish to send 32 bit integers over that link.
The spec for the serial link should define if those integers go over as big or little endian.
Let's assume the serial link spec says it is little endian, and that each character has 8 bits.

Here's C code for machine A to send a 32 bit integer as a sequence of four bytes in little endian order::
? ? sendbyte(data32);? sendbyte(data32>>8);? sendbyte(data32>>16);? sendbyte(data32>>24);
And code for machine B to receive that 32 bit integer (assumes getbyte() returnes an unsigned 8 bit integer):
? ? data32=getbyte();? data32|=getbyte()<<8;? data32|=getbyte()<<16; data32|=getbyte<<24;

This C code doesn't care if the machine it is on is big endian or little endian.
However the C code on both ends must be aware of the integer size it is dealing with, be it 8,16,32 bits.
So the serial link spec may need to fully define the format of the data stream, not just say whether
it is big or little endian.

Plenty of C code out there that is not endian agnostic like that, and I'm fine with it.
Those 24 bit shifts are expensive if your compiler is turned down to dumb,
a typecast of an int32 pointer to an array of bytes may look like a more efficient way to code.
Most machines these days are little endian with 8/16/32/64 bit word sizes, and I'm fine
with code that assumes this is the case.? (There are some big endian machines though.)

But if you are trying to code for a machine that could be either big or little endian
or might have some weird word length in hardware, I'm of the opinion that the above
is the best way to do it.? If nothing else, it's very easy to read.
?
Endian-ness has even more repercussions when creating hardware.
I always found that working with the big-endian VME bus was a PITA,
the extra shifts were rather expensive back in the days of TTL,

It seems obvious at first glance, big-endian means we send over the most significant byte first,
and little endian means we send over the least significant byte first.
But implementation of this in a mixed environment can become a real head scratcher.
Especially if the implementation is not thoroughly thought out before coding starts.

>? Your?sendbyte()?example, the sendbyte(data32>>24) leaves the high byte for sending.
>? If you don't know the endian order, how do you know you didn't just rotate the data of interest onto the floor?

I don't quite follow.
sendbyte(data32>>24)? ?will always send the 8 msb's of that 32 bit word, regardless of what machine you are on.
I know I didn't rotate the data of interest onto the floor because I know that my data was in the 8 msb's of the 32 bit word.?

> The second example is no different. Indeed, since the shift right operator "backfills" with 0's
> and has higher precedence that the bitwise AND operator, you example always sends 0 to the function. Why bother??
?
Hmm.? This second example?
? ? sendbyte((data32>>24)&0xff);
Only difference from the previous is that it makes it clear to the reader
that we are only interested in sending 8 bits.? And might save us from?
some weird bug if sendbyte() was not defined as an unsigned 8 bit int.
Looks fine to me.

Jerry


On Thu, Mar 8, 2018 at 10:34 am, Jack Purdum wrote:
OK, so what happens if you send an int from Allard's code to a 64 Intel I7? Compiler vendors are completely free to decide the byte order of all of their data types. My software company used to produced C programming tools (compilers, editors, assemblers, linkers) for both 8 bit and 16 bit machines. We made sure our Endians were the same, simply from a marketing standpoint. However, sending binary data from a 8 bit compiler to someone else's 16 bit compiler has no guarantee of working. Data structure packing and endian use is totally up to the compiler vendor. Indeed, there was one 8-bit MSDOS compiler vendor who chose to use -1 for NULL. The old XJ11 C standards committee made no restrictions on such things and the are defined as "implimentation dependent". That's why you should use NULL instead of 0 when checking string lengths. Now you could send the data as ASCII, but then you slow the transmission because values 0 through 255 only take 1 binary byte, but up to 3 ASCII bytes.
?
Your statement that "I can code all day in C without worrying about the big vs little endian" issue is only true at the source code level. If you are sending binary data, which is what I said in my post, you very definitely need to worry about the endian problem. As to 100 clock ticks, that seems high. An ldi assembler instruction take 3 clock cycles or 12 for a 32 bit long. Each rotate left (or right) is a single clock cycle, so I get 42 clock cycles to rotate a long off the map, and that includes the time to load it. So 0.000002625 of a second seems pretty quick Still, that's neither here nor there.
?
Your sendbyte() example, the sendbyte(data32>>24) leaves the high byte for sending. If you don't know the endian order, how do you know you didn't just rotate the data of interest onto the floor? The second example is no different. Indeed, since the shift right operator "backfills" with 0's and has higher precedence that the bitwise AND operator, you example always sends 0 to the function. Why bother?
?
Nope, there are times when you need to know the endian order and you can use a union to find it out. It can also be used to send binary data for a serial connection to a total different platform and still have it work. Knowing how to use a union is a good thing.
?
?


Join [email protected] to automatically receive all group messages.