HWYDI: String splitting with exceptions August 22nd, 2008

the Problem

In an application I’m writing I’m parsing information from a Wiki and formatting it in XML. Some of the data I’m parsing needs to get split into an array, for example

"To Do: Update description, add more details (client, date, ...), categorize, publish"
# Should become
{"To Do" => ["Update description", "add more details (client, data, ...)", "categorize", "publish"]}

As you can guess the problem is the the commas between brackets. I can’t just split on commas because then I’d get [...(clients", "date", "...)" ...]

my Solution

Nothing yet, I tried something that looped over every char with a flag whether to split or not, but that (ofcourse) doesn’t work with a Regexp, so I’m back to square #1.

How would You do it?

tags: , , , l

4 Responses to “HWYDI: String splitting with exceptions”

  • about 1 month ago Bramus! said

    Once had a project that needed to split in quite the same manner. Tried to whip up a nifty RegEx, yet it was a massiv FAIL.

    Then went for the Briek & Brak solution, in order to get results fast (fast as in ‘today’, not ‘time it takes to execute’):

    • Split on the commas
    • Loop all parts
    • if previous part has opening bracket, then merge the two (and that until you come to a closing bracket)

    Told you it was B&B (Quick & Dirty) :-P

  • about 1 month ago Frank said

    That’s exactly what I proposed yesterdayafternoon, but Jan is purist :)

  • about 1 month ago Inferis said

    I’d throw “CSV regex” into google, look for a regex that works and then adapt it (quotes become braces, no big deal). :/

  • about 1 month ago Zach said

    You could replace all commas within parentheses with some other character or sequence of characters, then split it by commas, and then replace the replacements with commas.

    class String
      def split_ignoring_parentheses(delim = ',', temp_replace = '``')
        split_with_replacements = gsub(/\(.*?\)/) { |s| s.gsub(delim, temp_replace) }.split(delim)
        split_with_replacements.map { |e| e.gsub(temp_replace, delim).strip }
      end
    end
    

Leave a Reply