Je t’embrasse Salutations from Silicon Valley, California

18Nov/110

C/C++ Regular Expressions

I had to do some regular-expression matching the other day (a bit of a grep-esque application)

#include <stdio.h>
#include <regex.h>
#include <stdlib.h>
#include <string.h>

The basic idea behind the "match" function is to provide a case-insensitive means of telling whether the "string" matches the "pattern". This is why I made this particular function a boolean. I find that this becomes much more powerful when used with a tokenizer, such that specific tokens are matched. (For example log-grepping for a specific user-level or process name)

bool
match(const char *string, const char *pattern)
{
  int status;
  char msg[1024];
  regex_t re;

  if(regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB|REG_ICASE) != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return false;
  }

  status = regexec(&re, string, (size_t)0, NULL, 0);
  regfree(&re);

  if (status == REG_NOMATCH) return false;
  else if (status != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return false;
  }

  return true;
}

Of course, sometimes matching does not go far enough (or in the previous example, sometimes your tokens are not always aligned). Which is why I created the regular-expression version of strstr(). This does exactly what you think, and searches for the regex "pattern" inside of "string". The only difference is that if it is found, we want to return the substring. Likewise if not found, we return NULL.

char *
reSubstring(const char *string, const char *pattern)
{
  int        status, size, i;
  char       msg[1024];
  regex_t    re;
  regmatch_t pmatch[1];
  char       *p_buf;

  if(regcomp(&re, pattern, REG_EXTENDED|REG_ICASE) != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return NULL;
  }

  status = regexec(&re, string, (size_t)1, pmatch, 0);
  regfree(&re);

  if (status == REG_NOMATCH) return NULL;
  else if (status != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return NULL;
  }

  size = (pmatch[0].rm_eo - pmatch[0].rm_so) + 1;
  p_buf = (char *)malloc(size);
  memset(p_buf, 0, size);

  for (i=pmatch[0].rm_so; i<pmatch[0].rm_eo; i++)
    p_buf[i-pmatch[0].rm_so] = string[i];

  return p_buf;
}
Filed under: C/C++ No Comments