aloc/idea-6.txt at main · modern-tooling/aloc · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
Yep. If you want fast understanding without deep reading, you need to treat a repo like a living system and use many weak signals that compound into strong insight.

Here are creative, high-signal views that go way beyond LOC/language/test split—most are implementable with shallow parsing, git metadata, build graph introspection, and dependency info.

1) The Responsibility Map

You already have roles (prod/test/infra/docs/config/...). Go one level deeper:

Add a second axis: Boundary

core (domain logic)

edge (integrations/adapters)

api (public surface)

data (storage/migrations)

ui (presentation)

Why it’s powerful: you can instantly see “we’re mostly edge code” (fragile) vs “we’re mostly core” (durable).

How to infer cheaply: path conventions + file naming + imports (shallow) + module graph.

2) Churn × Size “Heat Ledger” (the most underrated insight)

A file’s risk is closer to:

risk ≈ churn × coupling × criticality

Start with the easiest:

LOC per file

commits touching file in last 30/90 days

unique authors in last 90 days

Output a table of top 20 “hot files”:

“big and changing” (highest risk)

“small but changing constantly” (often config/edge bugs)

“big and never changing” (legacy monolith / hard-to-touch)

No deep reading required; git log --name-only gets you 80%.

3) The “Bus Factor” map (ownership, not people drama)

Compute for directories/modules:

% of changes by top contributor

of contributors above N commits

Insight: “billing/ has bus factor 1” is more actionable than “billing/ has 8k LOC”.

4) Boundary-Crossing Index (coupling proxy)

Look at dependency edges between modules/packages.

Create metrics:

fan_in (how many depend on this)

fan_out (how many this depends on)

“cross-boundary imports” (core importing edge is a smell)

Minimal implementation: parse imports at top of files (Go/TS) without full AST.

5) “API Surface Area” inventory

List and count:

exported Go symbols / public TS exports

routes / handlers

RPC/proto endpoints

CLI commands

config keys/env vars

Why it matters: public surfaces are where compatibility cost lives.

Bonus: split “public” vs “internal” and show ratio over time.

6) Config Surface & Fragility Meter

Config is a silent killer.

Track:

count of env vars used

count of config keys

number of places config is read

“config complexity” = #keys × #environments × #consumers

Cheap signal: grep for os.Getenv, process.env, viper.Get*, flag.*, YAML reads.

7) Test Coverage Without Coverage (proxy coverage)

Even if you can’t run coverage, you can estimate test attention:

For each module/dir:

ratio of test files to prod files

ratio of test LOC to prod LOC

presence of e2e/integration harnesses

“orphan prod files” (no nearby tests)

This catches deserts quickly.

8) “Time-to-Build” topology (build graph understanding)

Instead of reading code, learn the repo from its build system:

Go modules, packages

TS workspaces/turbo graph

Make targets

CI pipeline steps

Output:

dependency DAG summary

top-level entrypoints

“roots” and “leaves”

This often tells you how the system thinks of itself.

9) Interface Inventory (where seams exist)

Seams are comprehension shortcuts.

Detect:

Go interfaces and their implementers (cheap via type X interface + var _ X = (*Y)(nil) patterns)

TS interfaces/types used across packages

protobuf/grpc/http client boundaries

Report:

“top interfaces by #implementations”

“top implementations by #interfaces”

This gives you the “shape” of architecture without reading logic.

10) Error Surface Map

Errors tell stories.

Count and cluster:

return fmt.Errorf / errors.New

logged error messages

sentinel errors

“panic” and “TODO/FIXME”

Make a “top error-producing modules” list.

It’s a surprisingly accurate fault-line detector.

11) Observability Surface

For distributed systems: look for instrumentation points:

trace/span creation

metrics emitted

logger usage patterns

Summarize:

“modules with tracing”

“modules without any instrumentation”

“top log volume candidates” (by #log statements)

This tells you where visibility will be good/bad before you ship.

12) “Generated vs Handwritten” truth

Generated code inflates LOC and hides complexity.

Detect:

header markers (“Code generated…”)

directories like gen/, pb/, dist/

tool outputs

Report:

total generated LOC

generated LOC by source tool (protobuf, openapi, sqlc, etc.)

This prevents false conclusions about codebase size.

13) The “Change Risk Score” per directory

Composite score (simple, explainable):

size (LOC)

churn (commits last 90d)

coupling (fan-in/out)

low tests (test/prod ratio low)

high config touchpoints

low bus factor

Output top risky directories. This becomes your “where to invest” list.

14) “Narrative diff” between two commits (Tufte-friendly)

Not “what changed”, but “what meaning changed”:

responsibilities shifted (infra up, test down)

API surface changed (#exports/routes)

risk score shifted

hot files list changed

bus factor changed

This is perfect for PR comments and release notes.

15) “Concept extraction” without deep reading (careful, but useful)

Do shallow token frequency on identifiers (not full semantics):

top nouns in filenames, package names, exported symbols

cluster by prefix/suffix patterns (User*, Auth*, Invoice*)

detect “domain vocabulary” and its concentration

This helps new readers orient: “auth is everywhere”, “billing is isolated”, etc.

16) Architectural smell detectors (cheap heuristics)

Signal-based, no AST required:

core importing infra (boundary inversion)

cyclic deps between packages

directories with many tiny files (high fragmentation)

directories with few huge files (monolith)

lots of utils/ growth (missing abstractions)

too many TODO/FIXME in one module

These are “X-rays” that guide deeper reading.

If you want the most Tufte single report

The best single-page “insight artifact” is:

Responsibility breakdown

Ratios (test/prod, infra/prod, config/prod)

Risk leaderboard (top 10 dirs by change-risk score)

Hot files (big+churn)

Bus factor warnings

One sparkline trend (test/prod over 12 months)

Everything else is drilldown.

One suggestion that makes this feel magical

Treat every metric as a lens, and allow the user to “pivot”:

by role (infra)

by boundary (edge)

by timeframe (last 30 days)

by ownership (single-author areas)

by coupling (fan-in high)

Same data, different projection → genuine understanding.