Cache and defer scans to speed up class discovery
Hi @hpierce1102
First time contributing here, happy to take any feedback.
I was profiling php artisan lighthouse:ide-helper on a Laravel project running Lighthouse. The command took ~28 s. Xdebug pointed at ClassFinder rebuilding the PSR-4 namespace tree on every call.
I opened a PR on Lighthouse to swap the lib for a Composer-based custom finder (https://github.com/nuwave/lighthouse/pull/2768). The maintainer @spawnia prefers keeping ClassFinder and improving it upstream, which I agree is the cleaner path. This MR is that upstream effort. So this MR should closes #14 (closed)
Three small commits, each addresses one hotspot. Numbers below are from the same lighthouse:ide-helper run, Xdebug profile mode.
I tried to stick to the existing conventions: PHPDoc-only type hints (no native types since the lib supports PHP 5.3+), array() syntax over short arrays, etc. Let me know if I missed anything.
1. Cache PSR4 namespaces and classmap entries per app root
PSR4NamespaceFactory::getPSR4Namespaces() and ClassmapEntryFactory::getClassmapEntries() are called multiple times per getClassesInNamespace() (once in isNamespaceKnown, once in findBestPSR4Namespace). Each call rebuilt the full tree from scratch.
I cache the result keyed by app root (+ ignorePSR4Vendors for PSR-4). Re-keying lets the cache stay correct when setAppRoot() is called mid-process (existing tests do this).
Before: getPSR4Namespaces called 23 times, ~927 ms each. createNamespace 87,906 calls, scandir 88,170 calls. After: 1 build + 22 cache hits. createNamespace 3,831 calls.
~28 s -> ~2.0 s (x14)
2. Resolve direct subnamespaces lazily on first access
createNamespace was eagerly recursing into every subdirectory to build the subnamespace tree. But getDirectSubnamespaces() is only ever called from getClassesFromListRecursively (in RECURSIVE_MODE). In STANDARD_MODE, the tree is built and never used.
I move the recursion behind a resolver: setSubnamespacesResolver() registers a callable, getDirectSubnamespaces() invokes it on first access and memoizes. setDirectSubnamespaces() still works as before for callers that explicitly set the tree.
Before: createNamespace 3,831 calls per command run. After: only the namespaces actually queried get built.
~2.0 s -> ~1.5 s (cumulative x19)
3. Precompute direct namespace on ClassmapEntry construction
doesMatchDirectNamespace() ran explode + array_pop + implode on the class name for every entry on every STANDARD_MODE query. Profile showed 140,844 invocations on a single command run (12 queries x ~11,700 entries).
I compute the parent namespace once in the constructor with strrpos + substr and compare with a single string check.
Before: doesMatchDirectNamespace ~262 ms self time. After: ~55 ms.
~1.5 s -> ~1.4 s (cumulative x20)
Summary
| Time | Speedup | |
|---|---|---|
| Baseline | ~28 s | |
| + Cache | ~2.0 s | x14 |
| + Lazy subnamespaces | ~1.5 s | x19 |
| + Precompute classmap | ~1.4 s | x20 |
Tests
4 new unit tests added for the lazy resolver and the existing classmap edge cases.
Output of the Lighthouse integration scenario is byte-identical before and after.