String Extension Method: IsGuid()

I wrote an extension method for the string type. With this method you can check if an string is an Guid. The method returns a Bool.

[code:c#]

public static class StringExtensions
{
  public static bool IsGuid(this string input)
  {
    Regex isGuid = new Regex(@"^({){0,1}[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}(}){0,1}$", RegexOptions.Compiled);
    bool isValid = false;
    if (input != null)
    {
      if (isGuid.IsMatch(input))
      {
        isValid = true;
      }
    }
  return isValid;
  }
}

[/code]

If you're wondering I didn't wrote the Regex myself 🙂 (but it works)

This brute-force method is mutch faster (thanks to Arjan and Tag for the comments):

[code:c#]

public static bool IsGuid(this string input)
{
try
{
new Guid(input);
return true;
}
catch (ArgumentNullException) {}
catch (FormatException) {}
catch (OverflowException) {}
return false;
}

[/code]

 

A good example how extension methods can make some things easier. 

Hope it helps!

7 Replies to “String Extension Method: IsGuid()”

  1. This brute-force method is generally much faster…

    public static bool IsGuid(this string input)
    {
    try
    {
    new Guid(input);
    return true;
    }
    catch (ArgumentNullException) {}
    catch (FormatException) {}
    catch (OverflowException) {}
    return false;
    }

  2. Nice, but Regex can take care of the case-sensitivity.

    public static bool IsGuid(this string input)
    {
    return Regex.Match(input, @"^ (?:{)? (?<GUID> [0-9a-f]{8} – [0-9a-f]{4} – [0-9a-f]{4} – [0-9a-f]{4} – [0-9a-f]{12} ) (?:})? $",
    RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled).Success;
    }

    Takes 5.6 seconds for 1.000.000 tests.

    The brute force way takes 1.3 seconds only !

  3. It’s not true that the ‘brute force’ method described is generally much faster since in the general case not all strings tested will represent a guid, and although setting up an exception frame is theoretically free throwing an exception is very expensive and therefore it’s only faster if most of the strings under test represent valid guids.

    It’s also the case that in the regex example you’re creating a new Regex object, compiling it, using it in a match test then discarding it on every call, the code would actually run faster if you didn’t compile it and faster still if you compiled it outside of the method as it’s a method invariant.

    Let’s call the ‘brute force’ method IsGuid2 and compare it to the method IsGuid1 defined below.

    [quote]
    public static readonly Regex RxGuidMatcher = new Regex(@"^[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}$", RegexOptions.Compiled|RegexOptions.CultureInvariant);

    public static bool IsGuid1(this string subject) {
    return !string.IsNullOrEmpty(subject) && RxGuidMatcher.IsMatch(subject);
    }
    [/quote]

    with the following subjects:

    [quote]
    var subjectTrue = "{60E949FF-CE08-45CE-AADD-2F2251B56589}";
    var subjectFalse = "{60E949FF-CE08-45CE-AADDG-F2251B5658X}";
    [/quote]

    Out of 1000 iterations, if every test is subjectTrue we get ~ the following metrics:

    IsGuid1: 87.1ms
    IsGuid2: 24.9ms

    So in this rather specialized case the winner is clear, but what if half of the tests were subjectFalse:

    IsGuid1: 89.7ms
    IsGuid2: 430.5ms

    Ouch, I know which implementation I’d prefer to use as a general utility, the moral of the story being if you want realistic metrics test as many aspects of the code as possible.

    But there’s more, given what we know about a GUID and given the goal of our test we can write something a lot less naive:

    [quote]
    public static bool IsGuid(this string subject) {
    if(string.IsNullOrEmpty(subject)) return false;
    subject = subject.Trim();
    var result = ‘{‘ == subject[0] ? ‘}’ == subject[37] && 38 == subject.Length : 36 == subject.Length;
    if(result) {
    var offset = ‘{‘ == subject[0] ? 1 : 0;
    result = ‘-‘ == subject[offset + 8]
    && ‘-‘ == subject[offset + 13]
    && ‘-‘ == subject[offset + 18]
    && ‘-‘ == subject[offset + 23];
    if(result) {
    var slen = subject.Length – offset;
    for( var k = offset; k < slen; k++ ) {
    var suspect = subject[k];
    result = (‘A’ <= suspect && ‘F’ >= suspect)
    || (‘a’ <= suspect && ‘f’ >= suspect)
    || (‘0’ <= suspect && ‘9’ >= suspect)
    || ‘-‘ == suspect;
    if( !result ) break;
    }
    }
    }

    return result;
    }
    [/quote]

    (Please note that I’ve just thrown that bit of code together and haven’t tested it fully, but you get the idea)

    Now with our same tests what can we expect:

    Out of 1000 iterations, if every test is subjectTrue we get ~ the following metrics:

    IsGuid: 1.8ms

    And when half the tests are subjectFalse:

    IsGuid: 1.5ms

    Essentially we get the same performance metrics regardless of whether the subject under test represents a valid GUID or not, and that’s much more desirable in the general case.

    And as a final note, what if we were testing a million strings, half of which were subjectTrue, the others subjectFalse:

    IsGuid: 833.8ms
    IsGuid1: 6,993.0ms
    IsGuid2: 42,821.0ms

    Yep, that’s right, ‘brute force’ comes in at over 42 seconds, no thanks!

  4. Ok, I just realized that in my previous text my regex doesn’t check for the curly braces and subjectFalse should be {60E949FF-CE08-45CE-AADD-2F2251B5658X}, however although that changes the timings a little it doesn’t change the overall picture.

Leave a Reply

Your email address will not be published. Required fields are marked *