I’m a bit confused reading through the rules, it seems impossible to determine when allergens may not always be listed. I probably just need to read through it again, but for now I’ll move on to today’s problem.

four hours later, afer getting stuck on 22.2…

I think this is the key that I didn’t fully appreciate on first read: “Each allergen is found in exactly one ingredient.” Let’s walk through the example:

mxmxvkd kfcds sqjhc nhms (contains dairy, fish)
trh fvjkl sbzzf mxmxvkd (contains dairy)
sqjhc fvjkl (contains soy)
sqjhc mxmxvkd sbzzf (contains fish)
  1. The first two both contain dairy, and the only common ingredient is mxmxvkd, so that must be the dairy ingredient.
  2. The first and last both contain fish, and have mxmxvkd and sqjhc in common. Since we know mxmvkd is dairy, then sqjhc must be fish.
  3. The third only contains two ingredients: we know sqjhc contains fish, which isn’t indicated, so fvjkl must be soy.
  4. So now we know the three ingredients for the three listed allergens, so the remaining ingredients cannot have them: kfcds, nhms, trh, and sbzzf.

Oh good, that agrees with the problem statement! So as with day 16 we can generalize these into several rules, that we’ll then be able to apply via set operations:

  1. If two foods contains the same allergen and only have one ingredient in common (that doesn’t have a known allergen), the allergen must be in that ingredient.
  2. If a food only contains one ingredient that is not a known allergen, and has one unknown allergen listed, the allergen must be in that ingredient.

Start with reading the food list into a data structure, using Set for the ingredients and allergens.

struct Food
Foods = Vector{Food}

We’ll then combine all foods into supersets of all ingredients and allergens, which we’ll pull from (and stick into a Dict) as things are solved:

    for f in foods
        ingredients = union!(ingredients, f.ingredients)
        allergens = union!(allergens, f.allergens)
    # map allergen to the ingredient that contains it
    known = Dict{AbstractString,AbstractString}()

Implementing the above rules is straightforward. For number one: for each allergen, collect the set of ingredients that are common to all foods with that allergen; if there is only one ingredient then it must contain that allergen.

        for a in allergens
            common = copy(ingredients)
            for f in foods
                if a in f.allergens
                    intersect!(common, f.ingredients)
            if length(common) == 1
                known[a] = first(common)

For number 2: for each food, build a set of its ingredients that are not known to be allergens, and a set of its allergens with unknown ingredients; if both of those only contain one item then that ingredient is the allergen.

        for f in foods
            ui = setdiff(f.ingredients, values(known))
            ua = setdiff(f.allergens, keys(known))
            if length(ui) == length(ua) == 1
                known[first(ua)] = first(ui)

We repeat these steps until there are no more unknown allergens (or we were unable to deduce any more allergens after applying these rules). Then we count up the non-allergenic ingredients in the foods:

    sum([length(setdiff(f.ingredients, values(known))) for f in foods])

Part 2

Well this is an unusually simple part 2 - or was there a simpler way to do part 1? In any case, now we just need to sort the known allergens and combine their respective ingredients into a list:

    l = join([known[k] for k in sort(collect(keys(known)))], ",")