Cache and defer scans to speed up class discovery

Hi @hpierce1102 👋

First time contributing here, happy to take any feedback.

I was profiling php artisan lighthouse:ide-helper on a Laravel project running Lighthouse. The command took ~28 s. Xdebug pointed at ClassFinder rebuilding the PSR-4 namespace tree on every call.

I opened a PR on Lighthouse to swap the lib for a Composer-based custom finder (https://github.com/nuwave/lighthouse/pull/2768). The maintainer @spawnia prefers keeping ClassFinder and improving it upstream, which I agree is the cleaner path. This MR is that upstream effort. So this MR should closes #14 (closed) 😃

Three small commits, each addresses one hotspot. Numbers below are from the same lighthouse:ide-helper run, Xdebug profile mode.

I tried to stick to the existing conventions: PHPDoc-only type hints (no native types since the lib supports PHP 5.3+), array() syntax over short arrays, etc. Let me know if I missed anything.

1. Cache PSR4 namespaces and classmap entries per app root

PSR4NamespaceFactory::getPSR4Namespaces() and ClassmapEntryFactory::getClassmapEntries() are called multiple times per getClassesInNamespace() (once in isNamespaceKnown, once in findBestPSR4Namespace). Each call rebuilt the full tree from scratch.

I cache the result keyed by app root (+ ignorePSR4Vendors for PSR-4). Re-keying lets the cache stay correct when setAppRoot() is called mid-process (existing tests do this).

Before: getPSR4Namespaces called 23 times, ~927 ms each. createNamespace 87,906 calls, scandir 88,170 calls. After: 1 build + 22 cache hits. createNamespace 3,831 calls.

~28 s -> ~2.0 s (x14)

2. Resolve direct subnamespaces lazily on first access

createNamespace was eagerly recursing into every subdirectory to build the subnamespace tree. But getDirectSubnamespaces() is only ever called from getClassesFromListRecursively (in RECURSIVE_MODE). In STANDARD_MODE, the tree is built and never used.

I move the recursion behind a resolver: setSubnamespacesResolver() registers a callable, getDirectSubnamespaces() invokes it on first access and memoizes. setDirectSubnamespaces() still works as before for callers that explicitly set the tree.

Before: createNamespace 3,831 calls per command run. After: only the namespaces actually queried get built.

~2.0 s -> ~1.5 s (cumulative x19)

3. Precompute direct namespace on ClassmapEntry construction

doesMatchDirectNamespace() ran explode + array_pop + implode on the class name for every entry on every STANDARD_MODE query. Profile showed 140,844 invocations on a single command run (12 queries x ~11,700 entries).

I compute the parent namespace once in the constructor with strrpos + substr and compare with a single string check.

Before: doesMatchDirectNamespace ~262 ms self time. After: ~55 ms.

~1.5 s -> ~1.4 s (cumulative x20)

Summary

Time Speedup
Baseline ~28 s
+ Cache ~2.0 s x14
+ Lazy subnamespaces ~1.5 s x19
+ Precompute classmap ~1.4 s x20

Tests

4 new unit tests added for the lazy resolver and the existing classmap edge cases.

Output of the Lighthouse integration scenario is byte-identical before and after.

Edited by Grégory Gérard

Merge request reports

Loading