Posted on May 24, 2025

This is meant to be an objdump of measures I’ve taken to to bump the performance of neovim. But, I make no guarantees about observable speed improvements.

# Use bytecode

Neovim recently added a feature that compiles lua to bytecode and stores it in a cache by default located at $HOME/.cache/nvim/luac/. To enable and use this cache, add the following to your init.lua:

vim.loader.enable(true)

This significantly improves startup times for me since lua didn’t have to re-interpret .lua files. Prior to enabling lazy loading, this improved my performance from ~300ms startup to a ~200ms startup.

To see the startup time, I used:

neovim --startuptime bmfile

where bmfile is a temporary file that neovim pipes output to.

Note: the first time the bytecode is compiled, there is a slowdown while neovim performs the compilation step.

Note: this isn’t 100% for startup time optimization. Once adding lazy loading, plugins may be lazy loaded and thus make a difference on runtime performance.

# Lazy Loading

Conceptually, lazy loading of neovim plugins happens allows for some sort of event (for example, an autocmd) to load a neovim plugin. This allows for a faster startup and possibly a bit faster neovim when the plugins are not loaded.

There’s a couple of options for lazy loaders. I ended up using lze.

The way this works is in three steps:

First, mark the plugins you want to load lazily as optional. In mnw, this is done by setting the plugins.opt attribute in the mnw.lib.wrap function (example).

Then, include a snippet that provides a trigger and initialization options. For example,

require("lze").load {
  "ferris-nvim", -- plugin name to load
  ft = {"rust"}, -- the event is opening a rust file
  after = function()
    -- configuration that is performed when the event is hit
    require('ferris').setup({})
    vim.api.nvim_set_keymap('n', '<leader>rl', '<cmd>lua require("ferris.methods.view_memory_layout")()<cr>', {})
    vim.api.nvim_set_keymap('n', '<leader>rhi', '<cmd>lua require("ferris.methods.view_hir")()<cr>', {})
    vim.api.nvim_set_keymap('n', '<leader>rmi', '<cmd>lua require("ferris.methods.view_mir")()<cr>', {})
    vim.api.nvim_set_keymap('n', '<leader>rb', '<cmd>lua require("ferris.methods.rebuild_macros")()<cr>', {})
    vim.api.nvim_set_keymap('n', '<leader>rm', '<cmd>lua vim.cmd.RustLsp("expandMacro")<cr>', {})
  end,
}

# CPU optimizations (Cross Compilation)

I’ve based this on this reddit post.

We would like CPU specific instructions (e.g. AVX or SSE4 or something) to be enabled for each computer we build on. This may be faster than without these instructions in some cases.

To enable arch specific CPU optimizations, we specify the hostSystem, localSystem, and targetSystem when importing nixpkgs. In particular:

import pkgs {
      inherit overlays;
      # the system being built *on*
      localSystem = {
        system = "x86_64-linux";
        gcc.arch = "znver3";
        gcc.tune = "znver3";
        gcc.abi = "64";
      };
      # the system to build *for*
      hostSystem = {
        system = "x86_64-linux";
        gcc.arch = "znver3";
        gcc.tune = "znver3";
        gcc.abi = "64";
      };
      # compilers (probably doesn't apply much) emit binaries with vector instructions
      targetSystem = {
        system = "x86_64-linux";
        gcc.arch = "znver3";
        gcc.tune = "znver3";
        gcc.abi = "64";
      };
    };

The gcc.* and system attributres will of course be processor specific, but in the case of my AMD 5950x threadripper, will enable the extra instruction sets that are supported by the zen3 microarchitecture.

But, that’s it.

Useful links to follow about this cross compilation step:

  • We’re due for a change to buildSystem to match the NixOS derivation buildPlatform once this PR is merged
  • What do these parameters actually mean? Check the nixos documentation here and thenixpkgs documentation here
    • buildPlatform is the machine doing the building
    • hostPlatform is the machine to run the built binary (possibly a compiler) on
    • targetplatform is the machine a compiler running on hostPlatform will emit binaries for

# LTO

TODO I haven’t gotten this to work yet