Je t’embrasse Salutations from Silicon Valley, California

20Jan/120

std::list iterate + erase

I had an interesting problem the other day where I needed to iterate over an std::list, removing invalid items from that list. The only problem was that the list is a list of pointers, each pointing to allocated values that must be freed.

Here is what I started with:

#include <stdio.h>
#include <list>
using std::list;

struct COORD { int x,y; };

int i,j;
struct COORD *c;
list<struct COORD *> coordList;
list<struct COORD *>::iterator it1;

/* Prep the list */
for (i=0, j=0; i<10, j<10; i++, j++) {
    c = (struct COORD *)malloc(sizeof(struct COORD));
    c->x = i; c->y = j;
    coordList.push_back(c);
}

/* Display while erasing all - for loop */
for (it1=coordList.begin(); it1!=coordList.end(); it1++) {
    c = (*it1);
    printf("A: (%d,%d)\n", c->x, c->y);
    coordList.erase(it1); /* segfault */
    free(c);
}

As you can see, the problem with this is in the .erase() call. As soon as the item is taken out of the list, "it" still points to that item. Even if you get lucky once or twice, eventually "it1" will point to a value outside of the bounds of coordList.begin() and coordList.end(). At this point, you will continue, because it1!=coordList.end()... and you will segfault when you try to erase the now-invalid iterator.

The fix is to not auto-increment the iterator.

/* Display while erasing all - for loop */
for (it=coordList.begin(); it!=coordList.end(); ) {
    c = (*it);
    printf("A: (%d,%d)\n", c->x, c->y);
    coordList.erase(it++);
    free(c);
}

Note that by removing the increment step from the loop argument, we have to do the increment ourselves.

Of course, you could also use the power of the .erase() method, which states that the return value is:

A bidirectional iterator pointing to the new location of the element that followed the last element erased by the function call, which is the list end if the operation erased the last element in the sequence.

Because of that we can do something a bit fancier:

/* Display while erasing some - for loop #2 */
for (it=coordList.begin(); it!=coordList.end(); ) {
    c = (*it);
    printf("B: (%d,%d)\n", c->x, c->y);
    it = coordList.erase(it);
    free(c);
}

See how we set the iterator value to the next available one by using the return value of the .erase() method. This is a relatively elegant solution, but that also is because we are always updating the iterator through use of the .erase() method. If we had to limit the erase, only deleting odd values (for example), we would have to remember to increment that iterator (otherwise the loop would stay pointing at the same list element forever).

/* Display while erasing some - for loop #3 */
for (it=coordList.begin(); it!=coordList.end(); ) {
    c = (*it);

    if ((c->x % 2) == 1) {
        printf("C: (%d,%d)\n", c->x, c->y);
        it = coordList.erase(it);
        free(c);
    }
    else {
        ++it;
    }
}

That's it. My trial-and-error wisdom, passed on.

Filed under: C/C++ No Comments
24Dec/110

Traversing a file in C

I can not begin to tell you how useful the following code has been in my endeavors to do complicated forward-backward grep-esque searching. Along with the regular-expression matching that I have put together previously, this will round off pretty much everything you need to do your own fancy-grepping.

The code below is based upon the fgetc() and fgets() functions. The first two, rgetc() and rgets() are essentially the reverse of the original functions. They read from the file, but instead of moving the file-pointer forward, they move it back. Thus, you could start at end-of-file, and traverse all the way back to beginning-of-file.

int
rgetc(FILE *stream)
{
  if (fseek(stream, -2, SEEK_CUR) == -1) return EOF;
  return fgetc(stream);
}

int
rgets(char *s, int size, FILE *stream)
{
  int n=0;
  int c;

  while (1) {
    if ((c = rgetc(stream)) == EOF) {
      /* if we are too close to BOF to rgetc() */
      if ((ftell(stream) <= 2) && (n+1 == 2)) {
        rewind(stream);
        n=2;
      }
      /* otherwise EOF == ERROR */
      else return EINVAL;
    }
    if (c == '\n') n++;
    if (n == 2) break;
  }

  if (fgets(s, size, stream) == NULL) return EINVAL;
  return 0;
}

Finally, there is always the time in which what you really want is to read without moving the file-pointer at all. This way, you get a character/line into a buffer, but you still have the same character/line pointed to at the end as you did at the beginning. (Very useful when you need to double-parse a line)

int
tgetc(FILE *stream)
{
  if (fseek(stream, -1, SEEK_CUR) == -1) return EOF;
  return fgetc(stream);
}

int
tgets(char *s, int size, FILE *stream)
{
  int n=0;
  int c;

  while (1) {
    if ((c = rgetc(stream)) == EOF) {
      /* if we are too close to BOF to rgetc() */
      if ((ftell(stream) <= 2) && (n+1 == 2)) {
        rewind(stream);
        n=2;
      }
      /* otherwise EOF == ERROR */
      else return EINVAL;
    }
    if (c == '\n') n++;
    if (n == 1) break;
  }

  /* Look for a newline, otherwise EOF */
  if (fgets(s, size, stream) == NULL) return EINVAL;
  if (strstr(s, "\n") == NULL) return EOF;
  return 0;
}

Trust me, if you want to grep through logs, going forward till X, backward from there till Y, find Z & re-grep entire file for Z, and then locate the 3rd occurrence of the word "ERROR" also associated with Z... anyway, you get the point. Grep is useless. My functions RULE!

Filed under: C/C++ No Comments
18Nov/110

C/C++ Regular Expressions

I had to do some regular-expression matching the other day (a bit of a grep-esque application)

#include <stdio.h>
#include <regex.h>
#include <stdlib.h>
#include <string.h>

The basic idea behind the "match" function is to provide a case-insensitive means of telling whether the "string" matches the "pattern". This is why I made this particular function a boolean. I find that this becomes much more powerful when used with a tokenizer, such that specific tokens are matched. (For example log-grepping for a specific user-level or process name)

bool
match(const char *string, const char *pattern)
{
  int status;
  char msg[1024];
  regex_t re;

  if(regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB|REG_ICASE) != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return false;
  }

  status = regexec(&re, string, (size_t)0, NULL, 0);
  regfree(&re);

  if (status == REG_NOMATCH) return false;
  else if (status != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return false;
  }

  return true;
}

Of course, sometimes matching does not go far enough (or in the previous example, sometimes your tokens are not always aligned). Which is why I created the regular-expression version of strstr(). This does exactly what you think, and searches for the regex "pattern" inside of "string". The only difference is that if it is found, we want to return the substring. Likewise if not found, we return NULL.

char *
reSubstring(const char *string, const char *pattern)
{
  int        status, size, i;
  char       msg[1024];
  regex_t    re;
  regmatch_t pmatch[1];
  char       *p_buf;

  if(regcomp(&re, pattern, REG_EXTENDED|REG_ICASE) != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return NULL;
  }

  status = regexec(&re, string, (size_t)1, pmatch, 0);
  regfree(&re);

  if (status == REG_NOMATCH) return NULL;
  else if (status != 0) {
    regerror(status, &re, msg, 1024);
    fprintf(stderr, "Error analyzing regular expression '%s': %s.\n",
            pattern, msg);
    return NULL;
  }

  size = (pmatch[0].rm_eo - pmatch[0].rm_so) + 1;
  p_buf = (char *)malloc(size);
  memset(p_buf, 0, size);

  for (i=pmatch[0].rm_so; i<pmatch[0].rm_eo; i++)
    p_buf[i-pmatch[0].rm_so] = string[i];

  return p_buf;
}
Filed under: C/C++ No Comments
14Sep/110

Pythons getcommandoutput() ported to C/C++

I ran into an issue the other day where, being an avid Python programmer, I was trying my best in C/C++ to mimic the command-output-capturing capabilities of Python.
In an effort to produce a C/C++ version of the Python commands.getcommandoutput() function, I ended up with the following:

#include <stdio.h>
#include <string.h>
#include <sys/wait.h>

int
command_output(const char *cmd, char *output, int readlen)
{
    FILE *fp;
    int retval=0;
    memset(output, 0, readlen);

    /* return error if pipe error */
    if ( !(fp = (FILE*)popen(cmd, "r")) ) return 1;

    if (fread(output, readlen, 1, fp) <= 0) {
        retval = pclose(fp);
        if (WIFEXITED(retval)) return 0;
        /* Command did not execute correctly, or was killed */
        return 2;
    }

    /* guarantee NULL termination */
    output[readlen - 1] = '\0';
    pclose(fp);
    return 0;
}

Although it does require a bit more information than the Python equivalent, it is extremely handy to have in one's toolbox. Especially when interactions are forced between your C/C++ internals, and preexisting applications. Note that this code operates with the same permissions as the application, thus cannot execute anything that the application does not have permission to execute.

Filed under: C/C++ No Comments
20May/110

commands to subprocess transition

The Python Documentation goes a long way in describing how to transition old code, but for some reason, fails to mention how to transition the legacy "commands.getoutput()" and "commands.getstatusoutput()" functionality into the new Python 2.6+ model of using the subprocess module.

Here is the code required (direct pull/modification of original 2.5 commands module code):

# Get the output from a shell command into a string.
# The exit status is ignored; a trailing newline is stripped.
# Assume the command will work with '{ ... ; } 2>&1' around it..
def getoutput(cmd):
    return getstatusoutput(cmd)[1]

# Ditto but preserving the exit status.
# Returns a pair (sts, output)
def getstatusoutput(cmd):
    import subprocess, os
    p = subprocess.Popen('{ ' + cmd + '; } 2>&1',
                         shell=True, stdout=subprocess.PIPE)
    sts = os.waitpid(p.pid, 0)[1]
    text = p.stdout.read()
    p.stdout.close()
    if text[-1:] == '\n': text = text[:-1]
    return sts, text
Filed under: Python No Comments
13Apr/110

Pretty-Print for Python Dict/List/Tuple

I wrote this a while ago to give me a better view into complex nested structures Dict-of-List-of-Dict, etc.
The benefit here is that these functions can be wrapped as a Python module, and imported to all your projects for debug.

Enjoy.

import sys

def newline():
    sys.stdout.write('\n')
    sys.stdout.flush()

def __type_quote(this):
    if type(this) == type(1): return str(this)
    else: return "'"+str(this)+"'"

def __has_children(parent):
    answer = False
    valid = [type([1,2]), type((1,2)), type({1:2})]
    if type(parent) == type([1,2]):
        for each in parent:
            if type(each) in valid: answer = True
    elif type(parent) == type((1,2)):
        answer = __has_children(list(parent))
    elif type(parent) == type({1:2}):
        for (key, value) in parent.items():
            if type(value) in valid: answer = True
    return answer

def __print_list(list, n=0, shrink=False, opener="[", closer="]"):
    assert type(list) == type([])
    sp = "".join([" "]*n)
    valid = [type([1,2]), type((1,2)), type({1:2})]

    # find the max index (for spacing of the format string)
    size = len(list)
    fstring = "%s  %-"+str(size)+"s: "

    # start printing stuff
    sys.stdout.write(opener)
    sys.stdout.flush()
    list.sort()
    for index in range(len(list)):
        newline()
        sys.stdout.write(fstring%(sp, index))
        sys.stdout.flush()
        this = list[index]
        if not __has_children(this) and type(this) in valid:
            if type(this) == type([1,2]) and len(this) > 10 and shrink == True:
                sys.stdout.write("['%s', ... ,'%s']"%(this[0], this[-1]))
                sys.stdout.flush()
            else:
                sys.stdout.write(str(this))
                sys.stdout.flush()
        elif type(this) == type([1,2]): __print_list(this, n+4+size)
        elif type(this) == type({1:2}): __print_dict(this, n+4+size)
        elif type(this) == type((1,2)): __print_tuple(this, n+4+size)
        else:
            sys.stdout.write(str(this))
            sys.stdout.flush()
    if len(list) > 0:
        newline()
        sys.stdout.write("%s%s"%(sp, closer))
    else: sys.stdout.write("%s"%closer)
    sys.stdout.flush()

def __print_dict(dict, n=0, shrink=False):
    assert type(dict) == type({1:2})
    sp = "".join([" "]*n)
    valid = [type([1,2]), type((1,2)), type({1:2})]

    # find the max key-size (for spacing of the format string)
    size = 0
    for key in dict.keys():
        if len(str(key)) > size: size = len(str(key))
    fstring = "%s  %-"+str(size)+"s: "

    # start printing stuff
    sys.stdout.write("{")
    sys.stdout.flush()
    keys = dict.keys()
    keys.sort()
    for key in keys:
        newline()
        sys.stdout.write(fstring%(sp, key))
        sys.stdout.flush()
        this = dict[key]
        if not __has_children(this) and type(this) in valid:
            if type(this) == type([1,2]) and len(this) > 10 and shrink == True:
                sys.stdout.write("[%s, ... ,%s]"%(__type_quote(this[0]),
                                                  __type_quote(this[-1])))
                sys.stdout.flush()
            else:
                sys.stdout.write(str(this))
                sys.stdout.flush()
        elif type(this) == type([1,2]): __print_list(this, n+4+size)
        elif type(this) == type({1:2}): __print_dict(this, n+4+size)
        elif type(this) == type((1,2)): __print_tuple(this, n+4+size)
        else:
            sys.stdout.write(__type_quote(this))
            sys.stdout.flush()
    if len(dict.keys()) > 0:
        newline()
        sys.stdout.write("%s}"%sp)
    else: sys.stdout.write("}")
    sys.stdout.flush()

def __print_tuple(tup, n=0, shrink=False):
    assert type(tup) == type((1,2))
    __print_list(list(tup), n, shrink, opener="(", closer=")")

def print_dict(d):
    __print_dict(d, shrink=True)
    newline()

def print_list(l):
    __print_list(l, shrink=True)
    newline()

def print_tuple(t):
    __print_tuple(t, shrink=True)
    newline()
Filed under: Python No Comments
26Dec/100

Christopsomo (Greek Christmas Bread)

This recipe comes from my mother, via the "Sunset Cook Book of Breads" (1984).

Ready for Last Rise
Ingredients:
2 Tbps Active Dry Yeast
1/2 Cup Warm Water
1/2 Cup Scalded & Cooled Milk
1 Cup Butter
4 Eggs, beaten
3/4 Cup Granulated Sugar
2 teaspoons crushed Anise seed
1 teaspoon salt
7 Cups all-purpose flour

Directions:
Mix yeast with warm water & set aside for about 5 minutes. Mix together yeast mixture, milk, butter, eggs, salt, sugar, and anise thoroughly before adding flour. Gradually add flour one cup at a time mixing/massaging the flour in evenly.

Kneed the dough until the elasticity of it pushes back, and the dough is smooth. Roll into a ball in a large greased bowl, making sure to get the surface of the dough covered in the oil/grease. Set aside to rise (for the first time) in a warm/moist location. (Or if you are lazy like me, put it in an empty oven, with some almost-boiling water in a adjacent pan)

Ready to EatWhen the dough has doubled in size (1-2 hours) punch it down & divide it up into 2 even halves. Additionally cut off about 1 Cup worth of dough from each, and set aside (will be used later to decorate). Kneed each half into a smooth round, and place on a flat baking sheet. Shape the 1 Cup of dough into two equal-length ropes, cutting down each end of the ropes to create the traditional Greek cross shape. Finally, garnish the holes with walnuts or candied cherries, and wipe the top with an egg-white to add shine.

Once shaped, let rise again (the last time) until almost double in size. At this point, place in a 350-degree oven for about 45 minutes. (If you are me, and have risen the bread IN the oven, no worries, just turn on the oven to 350, and add 10 minutes for it to preheat WITH the bread already in there)

Alternatives:
You can make this same recipe into a single large loaf, but it is a bit unwieldy, and relatively hard to find a non-pizza pan for.

Filed under: Recipes No Comments
20Jun/100

An academic approach to ‘import’

As they say, Python is a language that does not force you to do things. Instead, it assumes the programmer to be a consenting adult who, knowingly or unknowingly has the power to bend and mold existing paradigms to fit whatever task is at hand. While this easy-peasy mindset sometimes can be an absolute asset, it will inevitably (hopefully not more than once) lead you down the winding path of destruction. Personally I feel that nothing in Python is quite as capable at dragging people down like the "import" statement.

Standard Imports:

import os
from os.path import isfile as isFile
from time import *

Hopefully this is not the first time you've heard this, but you should avoid the asterisk on imports. Unless you own the import (like a file with a bunch of common functions in it) then you can never be sure that a function defined within the import has not just blown away a function with the same name in your application.

User-specific Imports:
But what about the following... (Linux only I believe)

import os
import sys
sys.path.append(os.path.join(os.environ['HOME'], 'scripts'))
import userScript

This can be really helpful to provide some user-specific functionality. Basically the username determines the home directory, and the home directory then provides you with a specific subdirectory of Python scripts. If you log in as "root" you will get one set of functionality, whereas if you log in as "user" you will get a different set of functionality.

Watch out though, as environmental variables are very inaccurate. Anybody and anything can change these variables, and can leave you depending on incorrect or invalid values. One of the easiest ways to see this is by having a proper-daemon C/C++ application that calls a Python script. When you log on as any user and execute the binary, you will see the expected results. If you (via socket or RPC or whatever) call it remotely, the user defaults to the system, which unfortunately does not have certain environmental variables (like a "HOME" directory)... in other words, remote call = failure.

Application-relative (not path-relative) Imports:
Here is another one that I have seen, but have rarely used. Similarly to the previous condition, it requires appending to the system path, but this time we are working with a relative directory.

import os
import sys
sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
import someClass

There are two special things happening here. First, we are extracting the direct relative path to the file containing this code. If we were to run this through the interpreter, we would get an error because there is no "__file__". Second, because "__file__" is always the relative path to the Python file, it doesn't matter where you execute from, absolute path or not, you will always be referring to ".." from the Python file. This is EXTREMELY useful if you need to create an application that can be run via both relative and absolute path.

Path-agnostic Import Best Practices:
Although I have given multiple examples of how to get the system path to not care where you are executing from, I have for the most part found that the best practice with import paths is to use a configuration file. Why did I just complicate the import process even more? Well, lets take a look at the code:

import os
import sys
lines = open(os.path.join(os.path.dirname(__file__), "config.cfg")).splitlines()
paths = [line.split("=")[1] for line in lines if "path" is line.split("=")[0]]
for path in paths: sys.path.append(path)
import customModule

The benefit here is that now when you want to swap out "customModule" with a different version for testing, you dont need to move any directories, or rename any imports. All you do is change the config file. And yes, I did go a little overboard here, and allow you to add as many paths to the system path list as you want. The other big benefit here is that since you are always specifying an absolute path in the configuration file, you are always guaranteed that the import will succeed.

Filed under: Python No Comments
28May/100

Using a Property to get a Class name into an Attribute

For one reason or another, I found myself in a slight predicament. I needed to get the name of my class into an attribute.

Thanks to Google + KyLev I managed to procure a workable solution:

class Problem(object):
    moduleIndex = 0
    def __init__(self):
        self.__class__.moduleIndex = self.__class__.moduleIndex + 1

class Solution(object):
    moduleIndex = 0
    moduleName = property(fget=lambda self: self.__class__.__name__)
    def __init__(self):
        self.__class__.moduleIndex = self.__class__.moduleIndex + 1
        print "[%s] %s"%(self.moduleIndex, self.moduleName)

if __name__ == "__main__": t = Solution()
Filed under: Python No Comments
4Apr/100

C/C++ memcpy() for structs

At some point you will find that you need a permanent reference to a struct, but all that you have available is a non-static pointer to that struct. Now you will note that this commonly exists already in the dup2() function, taking a file-pointer and duplicating it to another file-pointer. So how can we do this for some other struct?

struct MY_STRUCT*
copyStruct(const struct MY_STRUCT *s) {
  if (s == NULL) return NULL;
  struct MY_STRUCT *d = (struct MY_STRUCT*)malloc(sizeof(struct MY_STRUCT));
  if (d == NULL) return NULL;
  memcpy(d, s, sizeof(struct MY_STRUCT));
  return d;
}

Of course, name it whatever you want, and use whatever struct you want. Obviously you get a pointer-to-struct on success, otherwise you get NULL... so make sure the caller checks... and calls free() when appropriate.

Filed under: C/C++ No Comments