Je t’embrasse Salutations from Silicon Valley, California

20Jan/120

std::list iterate + erase

I had an interesting problem the other day where I needed to iterate over an std::list, removing invalid items from that list. The only problem was that the list is a list of pointers, each pointing to allocated values that must be freed.

Here is what I started with:

#include <stdio.h>
#include <list>
using std::list;

struct COORD { int x,y; };

int i,j;
struct COORD *c;
list<struct COORD *> coordList;
list<struct COORD *>::iterator it1;

/* Prep the list */
for (i=0, j=0; i<10, j<10; i++, j++) {
    c = (struct COORD *)malloc(sizeof(struct COORD));
    c->x = i; c->y = j;
    coordList.push_back(c);
}

/* Display while erasing all - for loop */
for (it1=coordList.begin(); it1!=coordList.end(); it1++) {
    c = (*it1);
    printf("A: (%d,%d)\n", c->x, c->y);
    coordList.erase(it1); /* segfault */
    free(c);
}

As you can see, the problem with this is in the .erase() call. As soon as the item is taken out of the list, "it" still points to that item. Even if you get lucky once or twice, eventually "it1" will point to a value outside of the bounds of coordList.begin() and coordList.end(). At this point, you will continue, because it1!=coordList.end()... and you will segfault when you try to erase the now-invalid iterator.

The fix is to not auto-increment the iterator.

/* Display while erasing all - for loop */
for (it=coordList.begin(); it!=coordList.end(); ) {
    c = (*it);
    printf("A: (%d,%d)\n", c->x, c->y);
    coordList.erase(it++);
    free(c);
}

Note that by removing the increment step from the loop argument, we have to do the increment ourselves.

Of course, you could also use the power of the .erase() method, which states that the return value is:

A bidirectional iterator pointing to the new location of the element that followed the last element erased by the function call, which is the list end if the operation erased the last element in the sequence.

Because of that we can do something a bit fancier:

/* Display while erasing some - for loop #2 */
for (it=coordList.begin(); it!=coordList.end(); ) {
    c = (*it);
    printf("B: (%d,%d)\n", c->x, c->y);
    it = coordList.erase(it);
    free(c);
}

See how we set the iterator value to the next available one by using the return value of the .erase() method. This is a relatively elegant solution, but that also is because we are always updating the iterator through use of the .erase() method. If we had to limit the erase, only deleting odd values (for example), we would have to remember to increment that iterator (otherwise the loop would stay pointing at the same list element forever).

/* Display while erasing some - for loop #3 */
for (it=coordList.begin(); it!=coordList.end(); ) {
    c = (*it);

    if ((c->x % 2) == 1) {
        printf("C: (%d,%d)\n", c->x, c->y);
        it = coordList.erase(it);
        free(c);
    }
    else {
        ++it;
    }
}

That's it. My trial-and-error wisdom, passed on.

Filed under: C/C++ No Comments
24Dec/110

Traversing a file in C

I can not begin to tell you how useful the following code has been in my endeavors to do complicated forward-backward grep-esque searching. Along with the regular-expression matching that I have put together previously, this will round off pretty much everything you need to do your own fancy-grepping.

The code below is based upon the fgetc() and fgets() functions. The first two, rgetc() and rgets() are essentially the reverse of the original functions. They read from the file, but instead of moving the file-pointer forward, they move it back. Thus, you could start at end-of-file, and traverse all the way back to beginning-of-file.

int
rgetc(FILE *stream)
{
  if (fseek(stream, -2, SEEK_CUR) == -1) return EOF;
  return fgetc(stream);
}

int
rgets(char *s, int size, FILE *stream)
{
  int n=0;
  int c;

  while (1) {
    if ((c = rgetc(stream)) == EOF) {
      /* if we are too close to BOF to rgetc() */
      if ((ftell(stream) <= 2) && (n+1 == 2)) {
        rewind(stream);
        n=2;
      }
      /* otherwise EOF == ERROR */
      else return EINVAL;
    }
    if (c == '\n') n++;
    if (n == 2) break;
  }

  if (fgets(s, size, stream) == NULL) return EINVAL;
  return 0;
}

Finally, there is always the time in which what you really want is to read without moving the file-pointer at all. This way, you get a character/line into a buffer, but you still have the same character/line pointed to at the end as you did at the beginning. (Very useful when you need to double-parse a line)

int
tgetc(FILE *stream)
{
  if (fseek(stream, -1, SEEK_CUR) == -1) return EOF;
  return fgetc(stream);
}

int
tgets(char *s, int size, FILE *stream)
{
  int n=0;
  int c;

  while (1) {
    if ((c = rgetc(stream)) == EOF) {
      /* if we are too close to BOF to rgetc() */
      if ((ftell(stream) <= 2) && (n+1 == 2)) {
        rewind(stream);
        n=2;
      }
      /* otherwise EOF == ERROR */
      else return EINVAL;
    }
    if (c == '\n') n++;
    if (n == 1) break;
  }

  /* Look for a newline, otherwise EOF */
  if (fgets(s, size, stream) == NULL) return EINVAL;
  if (strstr(s, "\n") == NULL) return EOF;
  return 0;
}

Trust me, if you want to grep through logs, going forward till X, backward from there till Y, find Z & re-grep entire file for Z, and then locate the 3rd occurrence of the word "ERROR" also associated with Z... anyway, you get the point. Grep is useless. My functions RULE!

Filed under: C/C++ No Comments
18Nov/110

C/C++ Regular Expressions

I had to do some regular-expression matching the other day (a bit of a grep-esque application)

#include <stdio.h>
#include <regex.h>
#include <stdlib.h>
#include <string.h>

The basic idea behind the "match" function is to provide a case-insensitive means of telling whether the "string" matches the "pattern". This is why I made this particular function a boolean. I find that this becomes much more powerful when used with a tokenizer, such that specific tokens are matched. (For example log-grepping for a specific user-level or process name)

bool
match(const char *string, const char *pattern)
{
  int status;
  char msg[1024];
  regex_t re;

  if(regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB|REG_ICASE) != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return false;
  }

  status = regexec(&re, string, (size_t)0, NULL, 0);
  regfree(&re);

  if (status == REG_NOMATCH) return false;
  else if (status != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return false;
  }

  return true;
}

Of course, sometimes matching does not go far enough (or in the previous example, sometimes your tokens are not always aligned). Which is why I created the regular-expression version of strstr(). This does exactly what you think, and searches for the regex "pattern" inside of "string". The only difference is that if it is found, we want to return the substring. Likewise if not found, we return NULL.

char *
reSubstring(const char *string, const char *pattern)
{
  int        status, size, i;
  char       msg[1024];
  regex_t    re;
  regmatch_t pmatch[1];
  char       *p_buf;

  if(regcomp(&re, pattern, REG_EXTENDED|REG_ICASE) != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return NULL;
  }

  status = regexec(&re, string, (size_t)1, pmatch, 0);
  regfree(&re);

  if (status == REG_NOMATCH) return NULL;
  else if (status != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return NULL;
  }

  size = (pmatch[0].rm_eo - pmatch[0].rm_so) + 1;
  p_buf = (char *)malloc(size);
  memset(p_buf, 0, size);

  for (i=pmatch[0].rm_so; i<pmatch[0].rm_eo; i++)
    p_buf[i-pmatch[0].rm_so] = string[i];

  return p_buf;
}
Filed under: C/C++ No Comments
14Sep/110

Pythons getcommandoutput() ported to C/C++

I ran into an issue the other day where, being an avid Python programmer, I was trying my best in C/C++ to mimic the command-output-capturing capabilities of Python.
In an effort to produce a C/C++ version of the Python commands.getcommandoutput() function, I ended up with the following:

#include <stdio.h>
#include <string.h>
#include <sys/wait.h>

int
command_output(const char *cmd, char *output, int readlen)
{
    FILE *fp;
    int retval=0;
    memset(output, 0, readlen);

    /* return error if pipe error */
    if ( !(fp = (FILE*)popen(cmd, "r")) ) return 1;

    if (fread(output, readlen, 1, fp) <= 0) {
        retval = pclose(fp);
        if (WIFEXITED(retval)) return 0;
        /* Command did not execute correctly, or was killed */
        return 2;
    }

    /* guarantee NULL termination */
    output[readlen - 1] = '\0';
    pclose(fp);
    return 0;
}

Although it does require a bit more information than the Python equivalent, it is extremely handy to have in one's toolbox. Especially when interactions are forced between your C/C++ internals, and preexisting applications. Note that this code operates with the same permissions as the application, thus cannot execute anything that the application does not have permission to execute.

Filed under: C/C++ No Comments
4Apr/100

C/C++ memcpy() for structs

At some point you will find that you need a permanent reference to a struct, but all that you have available is a non-static pointer to that struct. Now you will note that this commonly exists already in the dup2() function, taking a file-pointer and duplicating it to another file-pointer. So how can we do this for some other struct?

struct MY_STRUCT*
copyStruct(const struct MY_STRUCT *s) {
  if (s == NULL) return NULL;
  struct MY_STRUCT *d = (struct MY_STRUCT*)malloc(sizeof(struct MY_STRUCT));
  if (d == NULL) return NULL;
  memcpy(d, s, sizeof(struct MY_STRUCT));
  return d;
}

Of course, name it whatever you want, and use whatever struct you want. Obviously you get a pointer-to-struct on success, otherwise you get NULL... so make sure the caller checks... and calls free() when appropriate.

Filed under: C/C++ No Comments