Sept. 3, 2021, 9:02 a.m.
IT

Modern Pitfall in Code Performance

Code syntax has become increasingly complex and expressive in recent years. There used to be a time when procedural languages like Java, C, C++, C# were purely procedural and functional languages like Lisp, Erlang were purely (mostly) declarative. Since these are very different mechanisms of programming, when writing code in a specific language a programmer tends to think either declaratively or imperatively.

So consider this simple example problem: An object needs to be created with a list of entries that contain properties describing a specific year and week. Let's say that property is: Hours. So each year / week combination has one property defined called Hours. This object gets instantiated and passed the list of resolved items. It also has a bunch of properties to allow iteration over the years and weeks to build up a grid of Hours and totals. A very naive implementation might look something like this in C#:

class SomeHolder {
  private Dictionary<int, Dictionary<int, List<int>>> entities;

  public List<int> Years { 
    List<int> years = new List<int>();
    get { 
      for (int year in entities.Keys) 
        years.Add(year);
      return years;
    } 
  }

  public Dictionary<int, List<int>> Weeks { 
    get { 
      Dictionary<int, List<int>> weeks = new Dictionary<int, List<int>>();
      for (int year in Years) {
        if (!weeks.ContainsKey(year))
          weeks[year] = new List<int>();
        for (int week in entities[year].Keys) 
          weeks[year].Add(week);
      }
      return weeks;
    } 
  }

  public Dictionary<int, Dictionary<int, int>> WeekTotals {
    get {
      Dictionary<int, Dictionary<int, int>> totals = new Dictionary<int, Dictionary<int, int>>();

      for (int year in Years) {
        if (!totals.ContainsKey(year))
          totals[year] = new Dictionary<int, int>();
        for (int week in Weeks) {
          if (!totals[year].ContainsKey(week))
            totals[year][week] = 0;
          for (int hour in entities[year][week].Values) 
            totals[year][week] += hour;
        }
      }

      return totals;
    }
  }

  public SomeHolder(Dictionary<int, Dictionary<int, List<int>>> entities) {
    this.entities = entities;
  }
}

I know there is a lot wrong with it and it is very obvious when looking at it. It is obvious because:

  1. When WeekTotals is indexed and evaluated, it is clear that it accesses Years and Weeks which are properties that will iterate over the entities dictionary every time. This results in an algorithm that is probably O(n2). Point is, every time you access WeekTotals[someYear][someWeek] to get a total, the CPU has to resolve Years once which results in a loop executed numYears times, followed by numYears iterations over Weeks which is itself a loop executed numYears + numYears * numWeeks in each year times. Assuming same number of weeks in a year, the WeekTotals executes numYears + numYears * (numYears + numYears * numWeeks in each year) + numYears * numWeeks in each year * entries per week times. Simplifying results in numYears * (1 + numYears*(1+numWeeks) + numWeeks), which is equivalent to numYears2 * numWeeks assuming numYears and numWeeks >> 1 and entries per week << numYears and numWeeks. I say that "it is clear" because the loops are obvious.
  2. The logic and algorithm is very visible so it is easy to spot.

Now consider a rewrite of above code using modern C# functional style lambda syntax but done so poorly:

class SomeHolder {
  private Dictionary<int, Dictionary<int, List<int>>> entities;

  public List<int> Years => entities.Keys.ToList();

  public Dictionary<int, List<int>> Weeks => entities.Select(pair => new KeyValuePair<int, List<int>>(pair.Key, pair.Value.Select(pair2 => pair2.Key).ToList())).ToDictionary(pair => pair.Key, pair => pair.Value);

  public Dictionary<int, Dictionary<int, int>> WeekTotals => entities.Select(pair => new KeyValuePair<int, Dictionary<int,int>>(pair.Key, pair.Value.Select(pair2 => new KeyValuePair<int, int>(pair2.Key, pair2.Value.Sum(i => i))).ToDictionary(pair => pair.Key, pair => pair.Value)))
        .ToDictionary(pair => pair.Key, pair => pair.Value);

  public SomeHolder(Dictionary<int, Dictionary<int, List<int>>> entities) {
    this.entities = entities;
  }
}

And to use it, perhaps something like this:

foreach (int year in someHolder.WeekTotals.Keys) {
  Console.WriteLine($"Year: {year}");
  foreach (int week in someHolder.WeekTotals[year].Keys) {
    Console.WriteLine($"Week: {week} - Total: {someHolder.WeekTotals[year][week]}");
  }
}

There are many problems with this example. The first is that the property WeekTotals will get calculated 1 + numYears + numYears*numWeeks times which can be simplified to numYears(1+numWeeks) and that is equal to numYears*numWeeks for large values. If WeekTotals is expensive to calculate, our O(n) algorithm becomes really slow for large values. Now start adding several different totals types to the class, iterate over years and weeks and for each access the total, forget to cache the lookup because you think it is just a Dictionary and try to create a grid of 200 weeks and you have code that takes 30 minutes to execute.

The problem is that the code is very terse, condensed. It does not seem like a lot is going on. And when you use these properties it is easy to forget that they each incur a small overhead to call. Multiply that by nesting some properties inside others as dependencies and the numbers quickly become exponentially larger. Imagine a simple scenario where we want to calculate the % of a total:

public Dictionary<int, Dictionary<int, int>> Totals1 => SomeHeavyCalculation();
public Dictionary<int, Dictionary<int, int>> Totals2 => SomeOtherHavyCalculation();
Dictionary<int, Dictionary<int, float>> Percentages => Totals2.ToDictionary(pair => pair.Key, pair =>
            pair.Value.ToDictionary(pair2 => pair2.Key, pair2 => (float)pair2.Value / Totals1[pair.Key][pair2.Key] * 100));

Imagine now iterating over Years and Weeks to print Percentages[year][week]. If Percentages[year][week] was a simple Dictionary, it would have been fine. However, each invocation of Percentages[year][week] translates into an invocation of Totals2 and numYears*numWeeks invocations of Totals1. So if Totals1 and Totals2 are expensive operations, invoking Percentages[year][week] numYears*numWeeks times translates to numYears*numWeeks * (1 + numYears*numWeeks) invocations of an expensive operation, which is (numYears*numWeeks)2 - our O(n) now became an O(n2). That algorithm will tank the moment we start increasing the number of weeks. Specifically, the algorithm will run 100 times slower if we increase the number of weeks by a factor of 10.

There are several solutions. My choice is to use lambdas if they are expressive and not confusing, even if they are slow, but make sure we initialize the properties only once. This makes many assumptions - for instance it is assumed each property will always be invoked fully so lazy initialization is wasteful.

class SomeHolder {
  private Dictionary<int, Dictionary<int, List<int>>> entities;

  public List<int> Years;      
  public Dictionary<int, List<int>> Weeks;      
  public Dictionary<int, Dictionary<int, int>> WeekTotals;
  public Dictionary<int, Dictionary<int, int>> Totals1;
  public Dictionary<int, Dictionary<int, int>> Totals2;
  Dictionary<int, Dictionary<int, float>> Percentages;

  public SomeHolder(Dictionary<int, Dictionary<int, List<int>>> entities) {
    this.entities = entities;

    Years = entities.Keys.ToList();

    Weeks = entities.Select(pair => new KeyValuePair<int, List<int>>(pair.Key, pair.Value.Select(pair2 => pair2.Key).ToList())).ToDictionary(pair => pair.Key, pair => pair.Value);

    WeekTotals = entities.Select(pair => new KeyValuePair<int, Dictionary<int,int>>(pair.Key, pair.Value.Select(pair2 => new KeyValuePair<int, int>(pair2.Key, pair2.Value.Sum(i => i))).ToDictionary(pair => pair.Key, pair => pair.Value)))
        .ToDictionary(pair => pair.Key, pair => pair.Value);

    Totals1 = SomeHeavyCalculation();

    Totals2 = SomeOtherHavyCalculation();

    Percentages = Totals2.ToDictionary(pair => pair.Key, pair => pair.Value.ToDictionary(pair2 => pair2.Key, pair2 => (float)pair2.Value / Totals1[pair.Key][pair2.Key] * 100));
  }
}

This way we only invoke the slow lambdas once during initialization. Accessing them now is cheap since they are truly only Dictionaries.

This is all very obvious to most developers - lambdas in C# have been available for more than a decade now, however it is easy to fall victim to syntactic sugar and compiler magic.