Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Programming

Journal npsimons's Journal: Heap Debugging in C 5

I learned some very interesting things today. First, valgrind kicks ass. Second, don't always believe what you read in man pages. Take this little snippet I'd been working on for the past 1.5 days:

uint16_t
checksum (packet)
    struct re_packet *packet;
{
  int i = 0;
  uint16_t checksum = 0, temporary = 0;
  unsigned int intermediate = 0;
  unsigned int *binary_blob = calloc (1, sizeof (struct re_header) + packet->header.size);
  if (binary_blob == NULL)
    return (-1);
 
/// Necessary because checksum doesn't include itself.
  temporary = packet->header.checksum;
  packet->header.checksum = 0;
 
  if (!memcpy (binary_blob, &packet->header, sizeof (struct re_header)))
    return (-1);
  if (!memcpy (binary_blob + sizeof (struct re_header),
          packet->data, packet->header.size))
    return (-1);
 
  for (i = 0; i < (packet->header.size + sizeof (struct re_header)) / 4 ;
      ++i)
    intermediate ^= binary_blob[i];
 
// Restore original checksum.
  packet->header.checksum = temporary;
 
  checksum = intermediate ^ (intermediate >> 16);
 
  SAFE_FREE (binary_blob);
 
  return (checksum);
}

Now, you see, the "binary_blob" has to be a pointer to an array of unsigned integers because that's what we're using for the checksum. So far so good, right? Except that the program using this little snippet has weird memory errors, and when I say weird, I mean segfaulting in calloc(). A sure sign of heap corruption. But where could it be coming from. Just as a check, I insert some code to print out how many iterations it goes through before it crashes; ends up being just one. Hmm, heap corruption usually takes longer than that.

As I watch the compile for the zillionth time, it occurs to me that wxWidgets is being linked because this project has a separate GUI; but this little piece of code doesn't need it. Perhaps wxWidgets is doing some weird initialization stuff that is really not necessary in a piece of C code. So I also quickly learn how to turn that off in automake, and recompile. Still the same error in the same place. Hmmm . . .

Go learn how to use valgrind with memory checking. It says that "binary_blob" is being written beyond it's end. Huh? It's exactly as big the writes *should* be. Double-check memcpy(3); nope, it clearly says the third argument "n" is the number of bytes.

After banging my head against this for another couple of hours, I finally get the crazy idea, "hey let's just try casting to see if that fixes it, even though it shouldn't make a difference." Et voila! It works:

uint16_t
checksum (packet)
    struct re_packet *packet;
{
  int i = 0;
  uint16_t checksum = 0, temporary = 0;
  unsigned int intermediate = 0;
  unsigned int *binary_blob = calloc (1, sizeof (struct re_header) + packet->header.size);
  if (binary_blob == NULL)
    return (-1);
 
/// Necessary because checksum doesn't include itself.
  temporary = packet->header.checksum;
  packet->header.checksum = 0;
 
  if (!memcpy ((void *) binary_blob, &packet->header,
    sizeof (struct re_header)))
    return (-1);
  if (!memcpy ((void *) binary_blob + sizeof (struct re_header),
          packet->data, packet->header.size))
    return (-1);
 
  for (i = 0; i < (packet->header.size + sizeof (struct re_header)) / 4 ;
      ++i)
    intermediate ^= binary_blob[i];
 
// Restore original checksum.
  packet->header.checksum = temporary;
 
  checksum = intermediate ^ (intermediate >> 16);
 
  SAFE_FREE (binary_blob);
 
  return (checksum);
}

Apparently, memcpy(3) does some hinky where it actually looks at the type of the dest you pass it and decides that, no, "n" doesn't actually mean bytes, it means the number of elements you want to write to in that array. At least that's my theory.

(Posting this here because I don't have my own blog and I'm too lazy to look up the C newsgroup on groups.google.com; excuse the incoherence; I'm really frazzled and just happy it works (for now)).

This discussion has been archived. No new comments can be posted.

Heap Debugging in C

Comments Filter:
  • The problem was in this line:

    if (!memcpy ((void *) binary_blob + sizeof (struct re_header), packet->data, packet->header.size))

    Adding the (void *) cast there fixed it, but not because memcpy() checks the type of the pointer you're passing it. This is C, functions can't do that - pointer types only exist inside the compiler.

    The problem was that pointer arithmetic goes by elements, not bytes. That is, just like "p++" advances p to the next element (not the next byte), "p + n" adds n elements, not n bytes. When binary_blob was a pointer to int, you weren't adding sizeof(struct re_header) bytes, you were a

    • "p + n" adds n elements

      Very interesting, I have never done that before. I have done the p++ thing, but never skipped a certain larger size than one.

      • by npsimons ( 32752 ) *

        Very interesting, I have never done that before. I have done the p++ thing, but never skipped a certain larger size than one.

        It's very handy when you have to flatten a data structure of variable size (ie, there is a pointer somewhere in it) to a byte string. Just have to be careful with the pointer arithmetic (as my little problem demonstrated).


    • by npsimons ( 32752 ) *

      Adding the (void *) cast there fixed it, but not because memcpy() checks the type of the pointer you're passing it. This is C, functions can't do that - pointer types only exist inside the compiler.

      This is what I had always assumed until yesterday. It was freaking me out that a C function might be polymorphic! But I have seen some interesting voodoo performed with C macros in the past.

      The problem was that pointer arithmetic goes by elements, not bytes. That is, just like "p++" advances p to the next elem

      • by Mr2001 ( 90979 )

        What's weird is that I am using CFLAGS="-O0 -Wall -Wextra -g3", and nary a warning came up for the old code or the new.
        Try using -Wpointer-arith, or maybe -ansi -pedantic. I was discussing this last night and my colleagues were also shocked that gcc -Wall didn't emit a warning.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...