refactoring « Kerry Buckley

18 March 2026

Zooming in on slow code with profiling and benchmarking tools

Filed under: Elixir,Software — Tags: benchmarker, benchmarking, debugging, performance, profiler, profiling, refactoring — Kerry Buckley @ 2:59 pm

I recently spent a bit of time trying to speed up some slow tests. There’s still a long way to go, but I managed to tweak some code in an integration test to speed it up, and better still improve the efficiency of some actual production code where the slow test was a symptom of slow code.

The integration test performs various steps, and having previously tried a few things more-or-less at random to make it take less time, I was slightly more organised, and adopted the time-honoured approach of sprinkling puts "Doing thing: #{DateTime.utc_now()}" lines through the code, then staring intently at the bit that took longest to see what it was up to.

The second one was less obvious, and once I’d realised that the bottleneck was the function under test rather than something inefficient going on in the test it felt like I needed to be a bit more scientific and get some actual stats on where the time was being spent, and thus I finally arrive at the point of this post.

I knew I’d used tools to do this in the past, but for the life of me couldn’t remember the name of the generic type of thing I was looking for (yes, I’m getting old). I went round the houses a bit, but for the benefit of the reader – which may just be future me – the word you want to google, along with your language of choice, is profiler. I’m writing in Elixir, and opted to use ExProf, which is a thin wrapper round Erlang’s built-in eprof.

The offending function was Premonition.ManageTasks.dashboard/1, so with some representative data in the database I ran this function inside ExProf, which produced the following output:

iex(1)> import ExProf.Macro
ExProf.Macro
iex(2)> profile do: Premonition.ManageTasks.dashboard("MCC PCC-SM")
FUNCTION                                                                  CALLS        %     TIME  [uS / CALLS]
--------                                                                  -----  -------     ----  [----------]
maps:merge_with_1/3                                                           2     0.00        0  [      0.00]
maps:to_list_internal/1                                                       2     0.00        0  [      0.00]
maps:iterator/1                                                               2     0.00        0  [      0.00]

  [ ... lots more uninteresting lines ... ]

'Elixir.Premonition.Tasks.Config.TaskDef':get_task/1                       9475     0.03      764  [      0.08]
'Elixir.Premonition.Tasks.Config':tasks/0                                     1     0.03      838  [    838.00]
lists:keyfind/3                                                            3004     0.04      998  [      0.33]
'Elixir.Access':get/3                                                      9578     0.05     1168  [      0.12]
'Elixir.Keyword':get/3                                                     2738     0.08     2154  [      0.79]
erlang:module_loaded/1                                                     1087     0.09     2208  [      2.03]
'Elixir.Access':get/2                                                      9578     0.10     2459  [      0.26]
lists:member/2                                                             4239     0.25     6299  [      1.49]
re:import/1                                                               66340     0.91    23117  [      0.35]
'Elixir.Premonition.Tasks.Config':task/1                                   9475    97.95  2490083  [    262.81]
-----------------------------------------------------------------------  ------  -------  -------  [----------]
Total:                                                                   212736  100.00%  2542126  [     11.95]

The profiler lists all the functions that end up getting called , both in our application code and in the standard libraries (the table is generated by the underlying Erlang tool, so if you’re only used to Elixir syntax then the format will look a bit odd). Each line includes the number of times the function was called, the total time (in microseconds) spent in that function, and what percentage that comprises of the total time. Clearly Premonition.Tasks.Config.task/1 is looking suspicious, being called 9,475 times and making up nearly 98% of the total time.

Here’s the code:

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks =
      Config.tasks()
      |> Map.keys()
      |> Enum.filter(&(TaskDef.node_type(&1) == node_type))
      |> Enum.sort_by(&Tasks.reference_for_sorting/1)
      |> Enum.map(&TaskInfo.new(&1, threshold_overrides, disabled_tasks))
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks))
      
    %{nodes: nodes, tasks: tasks}
  end

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks =
      Config.tasks()
      |> Map.keys()
      |> Enum.filter(&(TaskDef.node_type(&1) == node_type))
      |> Enum.sort_by(&Tasks.reference_for_sorting/1)
      |> Enum.map(&TaskInfo.new(&1, threshold_overrides, disabled_tasks))
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks))
      
    %{nodes: nodes, tasks: tasks}
  end

Starting on line 5, Config.tasks() returns a compile-time map of around 2,800 task identifiers to task configuration structs. We’re taking the keys from that list, and filtering for the ones with the correct node type before sorting them and building the structs we need. Without going into all the details, it turns out that TaskDef.node_type/1 looks up the task in the same map we started with, using the Premonition.Tasks.Config.task/1 function that the profiler highlighted. So we’re taking every key in the map, and looking it up to get its value (which we already had until we discarded the values with Map.keys/1). What if we filtered on the map itself, which would mean we were just making one pass through the map instead of one per identifier? Then we can just take the keys of the small subset we’re interested in.

Here’s a slightly-modified version of the function that still passes the tests:

def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks =
      Config.tasks()
      |> Map.filter(fn {_id, task} -> task.node_type == node_type end)
      |> Map.keys()
      |> Enum.sort_by(&Tasks.reference_for_sorting/1)
      |> Enum.map(&TaskInfo.new(&1, threshold_overrides, disabled_tasks))
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks))
      
    %{nodes: nodes, tasks: tasks}
  end

def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks =
      Config.tasks()
      |> Map.filter(fn {_id, task} -> task.node_type == node_type end)
      |> Map.keys()
      |> Enum.sort_by(&Tasks.reference_for_sorting/1)
      |> Enum.map(&TaskInfo.new(&1, threshold_overrides, disabled_tasks))
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks))
      
    %{nodes: nodes, tasks: tasks}
  end

There are no doubt more efficient approaches than creating a new map just to discard its values, but we’ll just make a note of that to return to later – it’s safer to only measure the effect of one change at a time.

Let’s run the profiler again (I’ve hidden everything apart from that one function call at the bottom of the table):

FUNCTION                                                                  CALLS        %     TIME  [uS / CALLS]
--------                                                                  -----  -------     ----  [----------]
  ...
'Elixir.Premonition.Tasks.Config':task/1                                   6632    96.80  1742984  [    262.81]
-----------------------------------------------------------------------  ------  -------  -------  [----------]
Total:                                                                   181503  100.00%  1800633  [      9.92]

Well it’s better (about 1.7 seconds rather than 2.5), but it’s still not great, and Premonition.Tasks.Config.task/1 is still taking up the vast majority of that time. Let’s dig a bit deeper – here’s the source again, this time also with the private function that we’re calling from line 10:

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks =
      Config.tasks()
      |> Map.filter(fn {_id, task} -> task.node_type == node_type end)
      |> Map.keys()
      |> Enum.sort_by(&Tasks.reference_for_sorting/1)
      |> Enum.map(&TaskInfo.new(&1, threshold_overrides, disabled_tasks))
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks))
      
    %{nodes: nodes, tasks: tasks}
  end
  
  defp node_with_tasks(node, tasks) do
    tasks_for_node =
      tasks
      |> Enum.filter(&(node.subtype in TaskDef.node_subtypes(&1.task) and TaskDef.run_on_node?(&1.task, node)))
      |> Enum.sort_by(&Tasks.reference_for_sorting(&1.task))
      
    %{node | tasks: tasks_for_node}
  end

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks =
      Config.tasks()
      |> Map.filter(fn {_id, task} -> task.node_type == node_type end)
      |> Map.keys()
      |> Enum.sort_by(&Tasks.reference_for_sorting/1)
      |> Enum.map(&TaskInfo.new(&1, threshold_overrides, disabled_tasks))
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks))
      
    %{nodes: nodes, tasks: tasks}
  end
  
  defp node_with_tasks(node, tasks) do
    tasks_for_node =
      tasks
      |> Enum.filter(&(node.subtype in TaskDef.node_subtypes(&1.task) and TaskDef.run_on_node?(&1.task, node)))
      |> Enum.sort_by(&Tasks.reference_for_sorting(&1.task))
      
    %{node | tasks: tasks_for_node}
  end

We’re calling another TaskDef function on line 23, and node_subtypes/1 behaves very much like node_type/1, but looking at a different field in the struct. One again, we’re looking up each task identifier in the full list (and repeating that lookup for each node), after throwing away the task struct earlier with our Map.keys() on line 8.

Let’s split up that first pipeline and hold onto the filtered tasks map. After a bit of fiddling around until the tests passed again, here’s another version. We’re now using a list of {task, task_info} tuples, so we can filter based on the task, then return the TaskInfo structs. We could have added more fields to the struct instead, but this didn’t quite feel right as that struct is used externally and passed between processes (which in Elixir don’t share data), so we want to add to the amount of data that has to be copied, when it’s only used internally here.

The code’s starting to get pretty messy, but let’s stick to one thing at a time – I know the mantra is “make it work; make it right; make it fast”, but we’re on the “make it fast” bit at the moment, and we can go back and “make it right” once we’re finished with the performance improvements.

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks_with_info =
      Config.tasks()
      |> Map.filter(fn {_id, task} -> task.node_type == node_type end)
      |> Enum.map(fn {id, task} -> {task, TaskInfo.new(id, threshold_overrides, disabled_tasks)} end)
      |> Enum.sort_by(fn {task, _task_info} -> Tasks.reference_for_sorting(task) end)
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks_with_info))
      
    %{nodes: nodes, tasks: Enum.map(tasks_with_info, &elem(&1, 1))}
  end
  
  defp node_with_tasks(node, tasks_with_info) do
    tasks_for_node =
      tasks_with_info
      |> Enum.filter(fn {task, _task_info} ->
        node.subtype in task.node_subtypes and TaskDef.run_on_node?(task.identifier, node)
      end)
      |> Enum.map(&elem(&1, 1))
      |> Enum.sort_by(&Tasks.reference_for_sorting(&1.task))
      
    %{node | tasks: tasks_for_node}
  end

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks_with_info =
      Config.tasks()
      |> Map.filter(fn {_id, task} -> task.node_type == node_type end)
      |> Enum.map(fn {id, task} -> {task, TaskInfo.new(id, threshold_overrides, disabled_tasks)} end)
      |> Enum.sort_by(fn {task, _task_info} -> Tasks.reference_for_sorting(task) end)
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks_with_info))
      
    %{nodes: nodes, tasks: Enum.map(tasks_with_info, &elem(&1, 1))}
  end
  
  defp node_with_tasks(node, tasks_with_info) do
    tasks_for_node =
      tasks_with_info
      |> Enum.filter(fn {task, _task_info} ->
        node.subtype in task.node_subtypes and TaskDef.run_on_node?(task.identifier, node)
      end)
      |> Enum.map(&elem(&1, 1))
      |> Enum.sort_by(&Tasks.reference_for_sorting(&1.task))
      
    %{node | tasks: tasks_for_node}
  end

Time for another profiler run:

FUNCTION                                                                  CALLS        %    TIME  [uS / CALLS]
--------                                                                  -----  -------    ----  [----------]
  ...
'Elixir.Premonition.Tasks.Config':task/1                                   2312    95.15  644231  [    278.65]
-----------------------------------------------------------------------  ------  -------  ------  [----------]
Total:                                                                   132140  100.00%  677048  [      5.12]

We’re down to well under a second now, but most of the time is still spent in calls to that one function. It’s important to consciously think about when to stop, but it feels like there’s still a decent improvement to be made (and in production there’ll be more nodes, so it’ll take longer than it does with the dummy data we’ve been using for these runs).

There’s only one place left where we call a TaskDef function, on line 5 of node_with_tasks/2:

  defp node_with_tasks(node, tasks_with_info) do
    tasks_for_node =
      tasks_with_info
      |> Enum.filter(fn {task, _task_info} ->
        node.subtype in task.node_subtypes and TaskDef.run_on_node?(task.identifier, node)
      end)
      |> Enum.map(&elem(&1, 1))
      |> Enum.sort_by(&Tasks.reference_for_sorting(&1.task))
      
    %{node | tasks: tasks_for_node}
  end

  defp node_with_tasks(node, tasks_with_info) do
    tasks_for_node =
      tasks_with_info
      |> Enum.filter(fn {task, _task_info} ->
        node.subtype in task.node_subtypes and TaskDef.run_on_node?(task.identifier, node)
      end)
      |> Enum.map(&elem(&1, 1))
      |> Enum.sort_by(&Tasks.reference_for_sorting(&1.task))
      
    %{node | tasks: tasks_for_node}
  end

Let’s look at TaskDef.run_on_node/2:

  def run_on_node?(task_id, node) do
    module = module(task_id)
    
    if function_exported?(module, :run_on_node?, 1) do
      module.run_on_node?(node)
    else
      true
    end
  end

  def run_on_node?(task_id, node) do
    module = module(task_id)
    
    if function_exported?(module, :run_on_node?, 1) do
      module.run_on_node?(node)
    else
      true
    end
  end

It’s a bit more complicated than the other ones we effectively inlined, because it checks whether the implementing module for a task has a run_on_node?/1 function, and calls it if so, otherwise defaulting to true. We don’t really want to replicate this logic, so instead we can allow the function to operate on either a task identifier or a TaskDef struct. Let’s add a test:

  describe "Premonition.Tasks.Config.TaskDef.run_on_node?/2" do
    ...
    test "supports TaskDef structs as well as identifiers" do
      assert TaskDef.run_on_node?(Config.task("SignallingMaintenanceServer.NumberOfEPAPBackups"), %Node{id: 1}) ==
               false
    end
    ...
  end

  describe "Premonition.Tasks.Config.TaskDef.run_on_node?/2" do
    ...
    test "supports TaskDef structs as well as identifiers" do
      assert TaskDef.run_on_node?(Config.task("SignallingMaintenanceServer.NumberOfEPAPBackups"), %Node{id: 1}) ==
               false
    end
    ...
  end

And make it pass:

  def run_on_node?(%TaskDef{} = task, node), do: do_run_on_node?(task.module, node)
  def run_on_node?(task_id, node), do: do_run_on_node?(module(task_id), node)
  
  defp do_run_on_node?(module, node) do
    if function_exported?(module, :run_on_node?, 1) do
      module.run_on_node?(node)
    else
      true
    end
  end

  def run_on_node?(%TaskDef{} = task, node), do: do_run_on_node?(task.module, node)
  def run_on_node?(task_id, node), do: do_run_on_node?(module(task_id), node)
  
  defp do_run_on_node?(module, node) do
    if function_exported?(module, :run_on_node?, 1) do
      module.run_on_node?(node)
    else
      true
    end
  end

Now we can pass in the struct (on line 23):

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks_with_info =
      Config.tasks()
      |> Map.filter(fn {_id, task} -> task.node_type == node_type end)
      |> Enum.map(fn {id, task} -> {task, TaskInfo.new(id, threshold_overrides, disabled_tasks)} end)
      |> Enum.sort_by(fn {task, _task_info} -> Tasks.reference_for_sorting(task) end)
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks_with_info))
      
    %{nodes: nodes, tasks: Enum.map(tasks_with_info, &elem(&1, 1))}
  end
  
  defp node_with_tasks(node, tasks_with_info) do
    tasks_for_node =
      tasks_with_info
      |> Enum.filter(fn {task, _task_info} ->
        node.subtype in task.node_subtypes and TaskDef.run_on_node?(task, node)
      end)
      |> Enum.map(&elem(&1, 1))
      |> Enum.sort_by(&Tasks.reference_for_sorting(&1.task))
      
    %{node | tasks: tasks_for_node}
  end

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks_with_info =
      Config.tasks()
      |> Map.filter(fn {_id, task} -> task.node_type == node_type end)
      |> Enum.map(fn {id, task} -> {task, TaskInfo.new(id, threshold_overrides, disabled_tasks)} end)
      |> Enum.sort_by(fn {task, _task_info} -> Tasks.reference_for_sorting(task) end)
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks_with_info))
      
    %{nodes: nodes, tasks: Enum.map(tasks_with_info, &elem(&1, 1))}
  end
  
  defp node_with_tasks(node, tasks_with_info) do
    tasks_for_node =
      tasks_with_info
      |> Enum.filter(fn {task, _task_info} ->
        node.subtype in task.node_subtypes and TaskDef.run_on_node?(task, node)
      end)
      |> Enum.map(&elem(&1, 1))
      |> Enum.sort_by(&Tasks.reference_for_sorting(&1.task))
      
    %{node | tasks: tasks_for_node}
  end

Let’s check we’re still heading in the right direction:

FUNCTION                                                                  CALLS        %    TIME  [uS / CALLS]
--------                                                                  -----  -------    ----  [----------]
  ...
'Elixir.Premonition.Tasks.Config':task/1                                   1231    91.03  339719  [    275.97]
-----------------------------------------------------------------------  ------  -------  ------  [----------]
Total:                                                                   116725  100.00%  373176  [      3.20]

Looks good – we’ve halved the number of calls to Config.task/1 again, and we‘re now down to a third of a second. Just one more step to go! At least I think so – I’m genuinely writing this up as I work through the process.

We still call Tasks.reference_for_sorting/1 twice, and this seems like another likely candidate for eliminating unnecessary lookups. Looking more closely though, on line 9 we pass in a struct and on line 26 a task ID. the function clearly supports a struct argument already, so we could just pass in a struct both times to avoid the lookups. But hang on a minute – why are we repeating the sort on line 26 anyway? We already sorted the structs before passing them into this function. Sure enough we can remove that line altogether without causing any tests to fail:

  defp node_with_tasks(node, tasks_with_info) do
    tasks_for_node =
      tasks_with_info
      |> Enum.filter(fn {task, _task_info} ->
        node.subtype in task.node_subtypes and TaskDef.run_on_node?(task, node)
      end)
      |> Enum.sort_by(fn {task, _task_info} -> Tasks.reference_for_sorting(task) end)
      |> Enum.map(&elem(&1, 1))
      
    %{node | tasks: tasks_for_node}
  end

  defp node_with_tasks(node, tasks_with_info) do
    tasks_for_node =
      tasks_with_info
      |> Enum.filter(fn {task, _task_info} ->
        node.subtype in task.node_subtypes and TaskDef.run_on_node?(task, node)
      end)
      |> Enum.sort_by(fn {task, _task_info} -> Tasks.reference_for_sorting(task) end)
      |> Enum.map(&elem(&1, 1))
      
    %{node | tasks: tasks_for_node}
  end

One last run of the profiler:

FUNCTION                                                                  CALLS        %   TIME  [uS / CALLS]
--------                                                                  -----  -------   ----  [----------]
  ...
'Elixir.Premonition.Tasks.Config':task/1                                   144    63.41  39575  [    274.83]
-----------------------------------------------------------------------  -----  -------  -----  [----------]
Total:                                                                   60015  100.00%  62413  [      1.04]

Great! We’ve gone from a function call taking a glacial 2.5s to a much more reasonable 62ms (a 40× improvement). In the words of James Cromwell, “That’ll do, Pig.”

Before we go, though, remember way back near the beginning when I said “there are no doubt more efficient approaches than creating a new map just to discard its values, but we’ll just make a note of that to return to later”? Let’s take a quick look at that now.

When it comes to choosing between algorithms based on performance, the profiler we’ve been using up to now isn’t quite the right tool for the job. Instead, we’re going to reach for a benchmarker – in this case Benchee. This tool runs multiple potential implementations many times each, to give a clear picture of which is faster.

We could just create a duplicate module, tweak the code in one of them and compare performance of the whole function, but let’s isolate the specific lines we want to experiment with, so we’re only looking at the time spent on that one operation.

The code’s changed a bit since we noticed the potentially inefficient creation of a new map, but in the current version it’s lines 7 and 8:

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks_with_info =
      Config.tasks()
      |> Map.filter(fn {_id, task} -> task.node_type == node_type end)
      |> Enum.map(fn {id, task} -> {task, TaskInfo.new(id, threshold_overrides, disabled_tasks)} end)
      |> Enum.sort_by(fn {task, _task_info} -> Tasks.reference_for_sorting(task) end)
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks_with_info))
      
    %{nodes: nodes, tasks: Enum.map(tasks_with_info, &elem(&1, 1))}
  end

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks_with_info =
      Config.tasks()
      |> Map.filter(fn {_id, task} -> task.node_type == node_type end)
      |> Enum.map(fn {id, task} -> {task, TaskInfo.new(id, threshold_overrides, disabled_tasks)} end)
      |> Enum.sort_by(fn {task, _task_info} -> Tasks.reference_for_sorting(task) end)
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks_with_info))
      
    %{nodes: nodes, tasks: Enum.map(tasks_with_info, &elem(&1, 1))}
  end

Another version, which doesn’t create an intermediate map, might look like this:

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks_with_info =
      Config.tasks()
      |> Enum.flat_map(fn
        {id, task} when task.node_type == node_type -> [{task, TaskInfo.new(id, threshold_overrides, disabled_tasks)}]
        _ -> []
      end)
      |> Enum.sort_by(fn {task, _task_info} -> Tasks.reference_for_sorting(task) end)
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks_with_info))
      
    %{nodes: nodes, tasks: Enum.map(tasks_with_info, &elem(&1, 1))}
  end

  def dashboard(node_type) do
    disabled_tasks = Repo.all(DisabledTask)
    threshold_overrides = threshold_overrides()
    
    tasks_with_info =
      Config.tasks()
      |> Enum.flat_map(fn
        {id, task} when task.node_type == node_type -> [{task, TaskInfo.new(id, threshold_overrides, disabled_tasks)}]
        _ -> []
      end)
      |> Enum.sort_by(fn {task, _task_info} -> Tasks.reference_for_sorting(task) end)
      
    nodes =
      from(n in Node, where: n.type == ^node_type, order_by: n.name)
      |> Repo.all()
      |> Enum.map(&node_with_tasks(&1, tasks_with_info))
      
    %{nodes: nodes, tasks: Enum.map(tasks_with_info, &elem(&1, 1))}
  end

Having confirmed that this version passes all the tests, let’s compare its performance with the original. We can copy the relevant bits of code into a script, hard-coding some values, and save it as benchmark.exs:

defmodule Benchmark do
  @moduledoc false
  alias Premonition.ManageTasks.TaskInfo
  
  def filter_and_map(tasks) do
    tasks
    |> Map.filter(fn {_id, task} -> task.node_type == "MCC PCC-SM" end)
    |> Enum.map(fn {id, task} -> {task, TaskInfo.new(id, [], [])} end)
  end
  
  def flat_map(tasks) do
    Enum.flat_map(tasks, fn
      {id, task} when task.node_type == "MCC PCC-SM" -> [{task, TaskInfo.new(id, [], [])}]
      _ -> []
    end)
  end
end

tasks = Premonition.Tasks.Config.tasks()
 
Benchee.run(%{
  "Filter and map" => fn -> Benchmark.filter_and_map(tasks) end,
  "Flat map" => fn -> Benchmark.flat_map(tasks) end
})

defmodule Benchmark do
  @moduledoc false
  alias Premonition.ManageTasks.TaskInfo
  
  def filter_and_map(tasks) do
    tasks
    |> Map.filter(fn {_id, task} -> task.node_type == "MCC PCC-SM" end)
    |> Enum.map(fn {id, task} -> {task, TaskInfo.new(id, [], [])} end)
  end
  
  def flat_map(tasks) do
    Enum.flat_map(tasks, fn
      {id, task} when task.node_type == "MCC PCC-SM" -> [{task, TaskInfo.new(id, [], [])}]
      _ -> []
    end)
  end
end

tasks = Premonition.Tasks.Config.tasks()
 
Benchee.run(%{
  "Filter and map" => fn -> Benchmark.filter_and_map(tasks) end,
  "Flat map" => fn -> Benchmark.flat_map(tasks) end
})

The important bit is at the bottom, where we tell Benchee to measure the two functions, which we’ve given friendly labels for the report. By default it will repetitively call each function for two seconds to make sure any caches are warmed up, then keep going for another five seconds, counting how many times the call completes (there are loads of configuration options that we’re ignoring here).

Running the script with iex -S mix run benchmark.exs produces the following output:

Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.19.5
Erlang 28.3.3
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 14 s
Excluding outliers: false

Benchmarking Filter and map ...
Benchmarking Flat map ...
Calculating statistics...
Formatting results...

Name                     ips        average  deviation         median         99th %
Filter and map         25.45       39.29 ms     ±2.15%       39.15 ms       41.46 ms
Flat map               24.20       41.32 ms     ±4.22%       40.83 ms       51.64 ms

Comparison:
Filter and map         25.45
Flat map               24.20 - 1.05x slower +2.04 ms

So it turns out the flat_map version is actually marginally slower! We’ll stick to the code we already had, and remember the importance of actually measuring the effect when we get tempted to write “clever” code that we think ought to be faster.

And now it’s time to stop focussing on performance, decide whether our modifications need any refactoring for readability, get the changes deployed so the users can see the page load faster, and move on to the next feature!

Epilogue

A few hours later, and the new code is now live. I visited the Manage Tasks page a few times before the change, and the Chrome developer tools were showing the Largest Contentful Paint as around 25–30 seconds (which it not unreasonably described as “bad”). Now it’s down to just over a second, which qualifies as “good”. ?

Comments (1)

17 July 2009

Naresh Jain’s refactoring teaser

Filed under: Ruby — Tags: clean code, refactoring, Ruby — Kerry Buckley @ 8:03 am

Naresh Jain recently posted a refactoring teaser. The original code was in Java, but I thought I’d have a go at refactoring it in Ruby instead. I deliberately didn’t look at Naresh’s solution beforehand, so mine goes in a rather different direction.

My code’s here (with the specs in the same file for simplicity), and you can step through the history to see the state at each step of the refactoring. The links in the text below take you to the relevant version of the file, and the Δs next to them link to the diffs.

Firstly, I converted the code to Ruby, and made sure the tests (which I converted to RSpec) still passed. It’s a fairly straight conversion, although I made a couple of changes while I was at it – mainly turning it into a module which I mixed into String. I changed the name of the method to phrases, to avoid conflicting with the built-in String#split. The regular expression to split on is much simpler too, because Ruby didn’t understand the original, and I had no idea what it was supposed to do anyway.

Here’s my initial Ruby version:

#!/usr/bin/env ruby
#
#See http://blogs.agilefaqs.com/2009/07/08/refactoring-teaser-part-1/

require 'spec'

module StringExtensions
  REGEX_TO_SPLIT_ALONG_WHITESPACES = /\s+/
 
  def phrases(number)
    list_of_keywords = ""
    count = 0
    strings = split(REGEX_TO_SPLIT_ALONG_WHITESPACES)
    all_strings = single_double_triple_words(strings)
    size = all_strings.size
    all_strings.each do |phrase|
      break if count == number
      list_of_keywords += "'" + phrase + "'"
      count += 1
      if (count < size && count < number)
        list_of_keywords += ", "
      end
    end
    return list_of_keywords
  end
 
  private
 
  def single_double_triple_words(strings)
    all_strings = []
    num_words = strings.size
 
    return all_strings unless has_enough_words(num_words)
 
    # Extracting single words. Total size of words == num_words
 
    # Extracting single-word phrases.
    (0...num_words).each do |i|
      all_strings << strings[i]
    end
 
    # Extracting double-word phrases
    (0...num_words - 1).each do |i|
      all_strings << "#{strings[i]} #{strings[i + 1]}"
    end
 
    # Extracting triple-word phrases
    (0...num_words - 2).each do |i|
      all_strings << "#{strings[i]} #{strings[i + 1]} #{strings[i + 2]}"
    end
    return all_strings
  end
  
  def has_enough_words(num_words)
    num_words >= 3
  end
end

String.send(:include, StringExtensions)

describe StringExtensions do
  it 'finds all phrases' do
    'Hello World Ruby'.phrases(6).should == "'Hello', 'World', 'Ruby', 'Hello World', 'World Ruby', 'Hello World Ruby'"
  end

  it 'returns all phrases when asked for more than exist' do
    'Hello World Ruby'.phrases(10).should == "'Hello', 'World', 'Ruby', 'Hello World', 'World Ruby', 'Hello World Ruby'"
  end

  it 'returns the first n phrases when asked for fewer than exist' do
    'Hello World Ruby'.phrases(4).should == "'Hello', 'World', 'Ruby', 'Hello World'"
  end

  it 'returns the first word when asked for one phrase' do
    'Hello World Ruby'.phrases(1).should == "'Hello'"
  end
end

I didn’t change the specs at all during the refactoring (because I didn’t change the API or add any new public methods or classes), and made sure they all passed at each step.

The first thing Δ I changed was to simplify that big iterator in phrases that loops through the list of phrases, formatting the output string. Basically all this does is to put each phrase in quotes, then stitch them all together separated by a comma and a space. The first of those tasks is a simple map, and the second is a join. The whole method collapses down to this (ruby methods automatically return the result of their last statement):

def phrases(number)
  strings = split(REGEX_TO_SPLIT_ALONG_WHITESPACES)
  all_strings = single_double_triple_words(strings)
  all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
end

Next Δ I remembered that by default String#split splits at whitespace anyway, so I did away with the regular expression. Then Δ I renamed the strings variable to words to make its purpose a little clearer, leaving the phrases method looking like this:

def phrases(number)
  words = split
  all_strings = single_double_triple_words(words)
  all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
end

The section of single_double_triple_words that extracted the single words seemed redundant, as we already had that list – the original words. I removed it Δ, and initialised all_strings to the word list instead (not forgetting to rename single_double_triple_words to match its new behaviour):

module StringExtensions
  def phrases(number)
    words = split
    all_strings = words
    all_strings += double_triple_words(words)
    all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
  end
 
  private
 
  def double_triple_words(strings)
    all_strings = []
    num_words = strings.size
 
    return all_strings unless has_enough_words(num_words)
 
    # Extracting double-word phrases
    (0...num_words - 1).each do |i|
      all_strings << "#{strings[i]} #{strings[i + 1]}"
    end
 
    # Extracting triple-word phrases
    (0...num_words - 2).each do |i|
      all_strings << "#{strings[i]} #{strings[i + 1]} #{strings[i + 2]}"
    end
    return all_strings
  end
  
  def has_enough_words(num_words)
    num_words >= 3
  end
end

That has_enough_words method seemed a bit odd – particularly the way it was only called once, rather than after extracting each set of phrases. I decided it was probably a premature and incomplete attempt at optimisation, and removed it Δ for now.

My next target was the duplication in the blocks that calculate double- and triple-word phrases. First Δ I extracted them into separate methods:

module StringExtensions
  def phrases(number)
    words = split
    all_strings = words
    all_strings += double_words(words)
    all_strings += triple_words(words)
    all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
  end
 
  private
 
  def double_words(strings)
    all_strings = []
    num_words = strings.size
 
    # Extracting double-word phrases
    (0...num_words - 1).each do |i|
      all_strings << "#{strings[i]} #{strings[i + 1]}"
    end
    return all_strings
  end
 
  def triple_words(strings)
    all_strings = []
    num_words = strings.size
 
    (0...num_words - 2).each do |i|
      all_strings << "#{strings[i]} #{strings[i + 1]} #{strings[i + 2]}"
    end
    return all_strings
  end
end

I decided that the num_words variable in the two new methods wasn't really necessary (it was only used once, and I think strings.size expresses intent perfectly clearly), so I inlined it Δ (and the same in triple_words):

def double_words(strings)
  all_strings = []

  # Extracting double-word phrases
  (0...strings.size - 1).each do |i|
    all_strings << "#{strings[i]} #{strings[i + 1]}"
  end
  return all_strings
end

Most of the code in double_words and triple_words was obviously very similar, so I created Δ a general extract_phrases method, and called it from both. The new method uses the start position and length version of Array#[] to extract the appropriate number of words, then Array#join to string them together separated by spaces:

def double_words(strings)
  extract_phrases(strings, 2)
end

def triple_words(strings)
  extract_phrases(strings, 3)
end

def extract_phrases(strings, number_of_words)
  result = []
  (0...strings.size - number_of_words + 1).each do |i|
    phrase = strings[i, number_of_words].join(' ')
    result << phrase
  end
  result
end

At this point double_words and triple_words have become just dumb wrappers around extract_phrases, so I removed them Δ and just called extract_phrases directly:

def phrases(number)
  words = split
  all_strings = words
  all_strings += extract_phrases(words, 2)
  all_strings += extract_phrases(words, 3)
  all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
end

Rather than hardcoding the calls for two and three words, I changed it Δ to use a loop:

def phrases(number)
  words = split
  all_strings = words
  (2..words.size).each do |number_of_words|
    all_strings += extract_phrases(words, number_of_words)
  end
  all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
end

I decided this was the point to put the optimisation back Δ and stop looking for phrases once we had enough:

def phrases(number)
  words = split
  all_strings = words
  (2..words.size).each do |number_of_words|
    break if all_strings.length >= number
    all_strings += extract_phrases(words, number_of_words)
  end
  all_strings[0, number].map {|s| "'#{s}'"}.join(', ')
end

I decided that number_of_words wasn't particularly clear, so changed it Δ to phrase_length, then Δ made the iterator in extract_phrases more ruby-like by using map:

def extract_phrases(strings, phrase_length)
  (0...strings.size - phrase_length + 1).map do |i|
    strings[i, phrase_length].join(' ')
  end
end

I then noticed that I hadn't been consistent in changing strings to words, so fixed that Δ.

Lastly Δ, I decided that even though it was only one line, the code that formats the output deserved pulling out into a method.

Here's my final version of the code:

module StringExtensions
  def phrases(number)
    words = split
    all_strings = words
    (2..words.size).each do |phrase_length|
      break if all_strings.length >= number
      all_strings += extract_phrases(words, phrase_length)
    end
    format_output(all_strings[0, number])
  end
 
  private
 
  def extract_phrases(words, phrase_length)
    (0...words.size - phrase_length + 1).map do |i|
      words[i, phrase_length].join(' ')
    end
  end

  def format_output(phrases)
    phrases.map {|s| "'#{s}'"}.join(', ')
  end
end

Comments (0)

Kerry Buckley What’s the simplest thing that could possibly go wrong?

18 March 2026

Zooming in on slow code with profiling and benchmarking tools

17 July 2009

Naresh Jain’s refactoring teaser