当前位置: 动力学知识库 > 问答 > 编程问答 >

c# - Better regular expression for ReverseStringFormat

问题描述:

I've been using for a while this neat function found here on SO:

 private List<string> ReverseStringFormat(string template, string str)

{

string pattern = "^" + Regex.Replace(template, @"\{[0-9]+\}", "(.*?)") + "$";

Regex r = new Regex(pattern);

Match m = r.Match(str);

List<string> ret = new List<string>();

for (int i = 1; i < m.Groups.Count; i++)

ret.Add(m.Groups[i].Value);

return ret;

}

This function is able to process correctly templates like:

My name is {0} and I'm {1} years old

While it fails with patterns like:

My name is {0} and I'm {1:00} years old

I would like to handle this failing scenario and add fixed length parsing.

The function transforms the (first) template as following:

My name is (.*?) and I'm (.*?) years old

I've been trying to write the above regular expression to limit the number of characters captured for the second group without success. This is my (terrible) attempt:

My name is (.*?) and I'm (.{2}) years old

I've been trying to process inputs like the following but the below PATTERN doesn't work:

PATTERN: My name is (.*?) (.{3})(.{5})

INPUT: My name is John 123ABCDE

EXPECTED OUTPUT: John, 123, ABCDE

Every suggestion is highly appreciated

网友答案:

It is highly unlikely that you will be able to measure the length of a captured group within the same Regex replacement.

I would strongly suggest you look at the following state machine implementation. Please note that this implementation also solves the multiple curly brace escape feature of string.Format.

First you will need a state enum, very much like this one:

public enum State {
    Outside,
    OutsideAfterCurly,
    Inside,
    InsideAfterColon
}

Then you will need a nice way to iterate over each character in a string. The string chars parameter represents your template parameter while the returning IEnumerable<string> represents consecutive parts of the resulting pattern:

public static IEnumerable<string> InnerTransmogrify(string chars) {
    State state = State.Outside;
    int counter = 0;

    foreach (var @char in chars) {
        switch (state) {
            case State.Outside:
                switch (@char) {
                    case '{':
                        state = State.OutsideAfterCurly;
                        break;
                    default:
                        yield return @char.ToString();
                        break;
                }
                break;
            case State.OutsideAfterCurly:
                switch (@char) {
                    case '{':
                        state = State.Outside;
                        break;
                    default:
                        state = State.Inside;
                        counter = 0;
                        yield return "(.";
                        break;
                }
                break;
            case State.Inside:
                switch (@char) {
                    case '}':
                        state = State.Outside;
                        yield return "*?)";
                        break;
                    case ':':
                        state = State.InsideAfterColon;
                        break;
                    default:
                        break;
                }
                break;
            case State.InsideAfterColon:
                switch (@char) {
                    case '}':
                        state = State.Outside;
                        yield return "{" + counter + "})";
                        break;
                    default:
                        counter++;
                        break;
                }
                break;
        }

    }

}

You could join the parts like so:

public static string Transmogrify(string chars) {
    var parts = InnerTransmogrify(chars);
    var result = string.Join("", parts);
    return result;
}

And then wrap everything up, like you originally intended:

private List<string> ReverseStringFormat(string template, string str) {
    string pattern = <<SOME_PLACE>> .Transmogrify(template);
    Regex r = new Regex(pattern);
    Match m = r.Match(str);

    List<string> ret = new List<string>();
    for (int i = 1; i < m.Groups.Count; i++)
        ret.Add(m.Groups[i].Value);

    return ret;
}

Hope you understand why the Regex language isn't expressive enough (at least as far as my understanding is concerned) for this sort of job.

网友答案:

The only way to solve your problem with regular expressions is using a custom matcher to replace the group capture length.

The code bellow does this in your example:

private static string PatternFromStringFormat(string template)
{
    // replaces only elements like {0}
    string firstPass = Regex.Replace(template, @"\{[0-9]+\}", "(.*?)");
    // replaces elements like {0:000} using a custom matcher
    string secondPass = Regex.Replace(firstPass, @"\{[0-9]+\:(?<len>[0-9]+)\}",
        (match) =>
        {
            var len = match.Groups["len"].Value.Length;
            return "(.{" + len + "*})";
        });

    return "^" + secondPass + "$";
}

private static List<string> ReverseStringFormat(string template, string str)
{
    string pattern = PatternFromStringFormat(template);

    Regex r = new Regex(pattern);
    Match m = r.Match(str);

    List<string> ret = new List<string>();
    for (int i = 1; i < m.Groups.Count; i++)
        ret.Add(m.Groups[i].Value);

    return ret;
}
分享给朋友:
您可能感兴趣的文章:
随机阅读: