HashMap 导致的性能劣化问题

最近我在为 idl formatter 添加了一个简单的 C++ formatter。但是在格式化一个两千行左右的文件需要 11s 左右。最后调查发现问题源自对 HashMap 的误用。

项目介绍

项目使用 tree-sitter 实现格式化功能。主要基本流程为：

解析源文件。
对源文件进行标记。（比如 append-space, append-newline 等）
根据标记重构语法树。

在标记时，我使用了 HashMap 储存节点的标记信息：

#[derive(Debug, Default, Clone, Hash, PartialEq, Eq)]
struct NodeKey {
    pub range: Range<usize>,
    pub kind: u16,
}

#[derive(Debug, Clone, Copy)]
pub enum QueryType {
    IdentInc,
    PrependIdentDec,
    IdentDec,
    IdentReset,
    IdentIgnore,
    PrependSpace,
    PrependNewline,
    AppendSpace,
    AppendNewline,
    Keep,
    PrependRmNewline,
}

let mut infos: HashMap<NodeKey, Vec<QueryType>> = Default::default();

使用 HashMap 主要有两方面的考虑：

HashMap 查询时间复杂度为 \(O(1)\)，性能一般比 BTreeMap 要好。
Range 结构体没有实现 Ord 接口，需要手动实现。而 Hash 可以使用派生宏实现，比较简单。

由于需要遍历 ) } } } ) 语法树，因此自然使用递归的方式进行操作，时间复杂度为 \(O(n)\)，在每次递归时需要查询应用到当前节点的标记信息：

pub fn walk( builder: &mut StringBuilder, root: Node, info: &HashMap<NodeKey, Vec<&str>>, source: &[u8], class=tok-w> -> anyhow::Result<()> { let range = root.byte_range(); let mut keep = false; info.iter() .filter(|(key, _)| key.kind == root.kind_id()) .filter(|(key, _)| range.start >= key.range.start && range.end <= key.range.end) .flat_map(|item| item.1) .for_each(|item| match *item { QUERY_IDENT_IGNORE => { builder.ignore_ident = true; } QUERY_KEEP => { keep = true; } _ => {} }); // ... }

由于进行了一次遍历，因此总的时间复杂度为 \(O(n^2)\)。

性能剖析

使用 perf 对性能分析，发现命中点最多的是 ts_language_public_symbol 函数。但是此函数只是进行了简单的数组索引。在栈图中查找此函数，发现整个栈都是黄色的，意味着此函数调用非常频繁。

观看火焰图发现耗时最多的几个函数都在 HashMap 中。尝试手动实现 Hash 函数，但是没有效果。

在函数中查找热点路径，发现 walk 函数中 HashMap 耗时非常客观，将此函数注释掉后函数耗时下降到 70ms。可见此函数是优化的关键。

结论

在此项目中，HashMap 的主要用途实际上是查找指定范围内的标记，而 HashMap 对范围查找并不擅长，导致性能劣化到 \(O(n)\)，考虑到 NodeKey 代表的实际上是 Node 在源码中的位置，因此可以考虑使用 BTreeMap 操作。BTreeMap 由于使用二分查找，可用将时间复杂度优化到 \(O(\lg n)\)。

最终结果如下：

#[derive(Debug, Default, Clone, PartialEq, Eq)] struct NodeKey { pub range: Range<usize>, pub kind: u16, impl PartialOrd for NodeKey { fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> { Some(self.cmp(other)) } impl Ord for NodeKey { fn cmp(&self, other: &Self) -> std::cmp::Ordering { match self.range.start.cmp(&other.range.start) { std::cmp::Ordering::Less => std::cmp::Ordering::Less, std::cmp::Ordering::Greater => std::cmp::Ordering::Greater, std::cmp::Ordering::Equal => match self.range.end.cmp(&other.range.end) { std::cmp::Ordering::Less => std::cmp::Ordering::Less, std::cmp::Ordering::Greater => std::cmp::Ordering::Greater, std::cmp::Ordering::Equal => self.kind.cmp(&other.kind), }, } } >let mut infos: BTreeMap<NodeKey, Vec<QueryType>> = Default::default(); pub fn walk( builder: &mut StringBuilder, root: Node, info: &BTreeMap<NodeKey, Vec<QueryType>>, source: &[u8], class=tok-w> -> anyhow::Result<()> { let range = root.byte_range(); let root_kind_id = root.kind_id(); let root_text = root.utf8_text(source)?; trace!(?root_text); let mut keep = false; let range = { let l = NodeKey { range: range.clone(), kind: root_kind_id, }; let r = NodeKey { range: std::ops::Range { start: range.start + 1, end: range.end, }, kind: root_kind_id, }; use std::ops::Bound::Excluded; use std::ops::Bound::Included; (Included(l), Excluded(r)) }; info.range(range) .flat_map(|item| item.1) .for_each(|item| match *item { QueryType::IdentIgnore => { builder.ignore_ident = true; } QueryType::Keep => { keep = true; } _ => {} }); }

进行优化后时间优化到 60ms。

其他优化

其他的一些优化包括：

将 QueryType 从字符串更改为枚举。时间优化了 1s 左右。
对 kind\_id, node.source 进行缓存。时间优化了 1s 左右。