Code Health

Octokraft continuously measures the health of your codebase and distills it into a single score. That score is not a vague sentiment — it is a precise, auditable number built from every issue detected across your repositories. This page explains exactly how that number is calculated, what the categories mean, and how to interpret the results on your dashboard.

How Scoring Works

Size Normalization

Scores are normalized by codebase size. Octokraft measures issues per thousand lines of code (KLOC), so a 100,000-line repository with 50 issues scores the same as a 10,000-line repository with 5 issues. This prevents large codebases from being unfairly penalized and makes scores comparable across projects of different sizes.

The Scoring Curve

Health scores use a sigmoid curve:

score = 100 / (1 + penalty / 6.0)

This has two important properties:

Resilience at the top. A single bad issue does not tank your score to zero. Scores decline gradually, then plateau. A codebase with one critical vulnerability is not equally bad as one with fifty.
Diminishing returns. Going from a score of 90 to 95 requires more effort than going from 50 to 55. The last few points reflect genuine excellence, not just the absence of obvious problems.

Overall Score

Your overall health score is a weighted average of all 8 category scores. Categories that represent higher risk to your system carry more weight.

The 8 Categories

Every issue detected in your codebase is classified into one of 8 categories. Each category has a weight that reflects its impact on system reliability, and a baseline threshold that defines the acceptable issue density.

Security

Weight: 2.0x -- Highest

What it measures: Vulnerabilities that could compromise your system, expose user data, or allow unauthorized access.Why it is weighted highest: A single security vulnerability can compromise your entire system. No amount of clean code or good architecture matters if an attacker can bypass authentication or inject malicious queries. Critical security issues have zero tolerance in scoring — even one causes significant score impact.Issues in this category:

SQL injection and other injection attacks
Hardcoded credentials and secrets in source code
Missing input validation on user-facing endpoints
Insecure deserialization
Exposed API keys or tokens
Authentication and authorization bypasses
Cross-site scripting (XSS) vectors

Baseline thresholds: Zero tolerance for critical issues (0.0 per KLOC). Very strict for high-severity issues (0.05 per KLOC).Example: A database query that interpolates user input directly into a SQL string instead of using parameterized queries is flagged as a critical security issue.

Runtime Risks

Weight: 1.5x -- High

What it measures: Code patterns that can crash your application or cause failures in production.Why it is weighted high: Runtime issues are the difference between software that works and software that fails under real conditions. Unlike code smells which slow development, runtime risks cause outages.Issues in this category:

Null pointer dereferences and unhandled nil access
Unhandled exceptions and missing error propagation
Resource leaks — unclosed database connections, file handles, network sockets
Race conditions in concurrent code
Type coercion errors that fail silently
Infinite loops and unbounded recursion
Memory leaks from retained references

Example: A function opens a database connection, performs a query, but does not close the connection in the error path. Under sustained load, the connection pool is exhausted and the service stops responding.

Test Coverage

Weight: 1.5x -- High

What it measures: Whether your tests actually verify the behavior of your code — not just whether tests exist.Why it is weighted high: Untested code breaks in production without warning. Tests are your safety net for every future change. A codebase without meaningful test coverage cannot be changed with confidence.What Octokraft evaluates:

Test presence per module — do modules that contain business logic also contain tests?
Structural coverage — what percentage of exported functions are exercised by test callers?
Assertion density — how many assertions per test function? A test with no assertions verifies nothing.
Mock usage patterns — are tests isolated, or do they depend on external services?

Important distinction: Octokraft does not simply measure line coverage. A test file that calls functions but never asserts on their output provides false confidence. Octokraft measures whether your tests actually verify behavior.Example: A module exports 12 public functions. Tests exist, but they only exercise 3 of those functions. The other 9 have zero test callers and would break silently on any change.

Code Smells

Weight: 1.0x -- Baseline

What it measures: Anti-patterns that slow development and increase the probability of bugs.Why it is baseline weight: Code smells do not directly cause production failures, but they make your codebase harder to understand, harder to change, and more likely to accumulate bugs over time. They represent the chronic condition of technical debt.Issues in this category:

God classes — classes with too many methods or responsibilities
Deeply nested conditionals that are difficult to reason about
Overly complex functions (high cyclomatic complexity)
Long parameter lists that indicate poor abstraction
Feature envy — functions that use more data from another module than their own
Primitive obsession — using raw strings and integers where domain types should exist

Example: A single controller class handles user authentication, payment processing, email sending, and report generation. Any change to one concern risks breaking another.

Duplication

Weight: 0.8x -- Moderate

What it measures: Repeated code that should be consolidated.Why it is weighted lower: Duplicated code is technical debt, but it does not directly cause bugs. The risk is indirect: when a bug is fixed in one copy but not the other, or when behavior diverges between copies over time.Issues in this category:

Copy-paste code blocks across files
Similar logic repeated with minor variations
Redundant implementations of the same algorithm
Duplicated validation rules across endpoints

Example: The same date-parsing logic appears in three different services. A timezone bug is fixed in one but not the other two, causing inconsistent behavior across the application.

Dead Code

Weight: 0.8x -- Moderate

What it measures: Code that exists but is never executed.Why it is weighted lower: Dead code does not cause bugs — it causes confusion. Developers waste time reading, maintaining, and working around code that serves no purpose. It inflates the codebase and creates misleading search results.How Octokraft detects it: Octokraft builds a call graph across your codebase. If an exported function has no callers and no references anywhere in the project, it is flagged as dead code. Internal or private functions are not flagged, since they may be intended for near-term use.Issues in this category:

Functions that are never called from anywhere in the codebase
Variables assigned but never read
Unreachable code paths after early returns or throws
Unused imports and dependencies
Feature-flagged code where the flag has been permanently disabled

Example: A utility function formatCurrency() was replaced by a library call six months ago. The original function still exists, still shows up in search results, and new developers occasionally call it instead of the library version.

Consistency

Weight: 0.7x -- Lower

What it measures: Whether your codebase follows its own established patterns.Why it is weighted lower: Inconsistency rarely causes bugs directly, but it significantly slows onboarding, increases cognitive load, and makes code review harder. When every file handles errors differently, developers cannot build reliable intuitions about how the codebase behaves.Issues in this category:

Mixed naming conventions (camelCase in some files, snake_case in others)
Inconsistent error handling patterns across modules
Different approaches to the same problem in different parts of the codebase
Naming patterns that deviate from established project conventions

How this connects to Conventions: Issues in this category are often generated by comparing code against the conventions Octokraft has detected in your codebase (see Architecture Intelligence). If 90% of your functions follow one naming convention, the other 10% are flagged here.Example: Most of the codebase returns errors using a Result type, but three modules use thrown exceptions instead. A developer working across modules has to remember two different error handling paradigms.

Compliance

Weight: 0.5x -- Lowest

What it measures: Supply chain and secrets hygiene — dependency licensing and credentials that should never live in source control.Why it is weighted lowest: These findings rarely cause a production outage on their own, but they expose the project to legal risk (incompatible licenses in dependencies) and long-tail security risk (leaked secrets that survive in git history). They are lower weight than security because most are preventative, not active exploits.Issues in this category:

Hardcoded secrets and credentials checked into source (API keys, tokens, private keys)
Dependencies with restrictive or incompatible licenses (detected from the SBOM)
Missing or unknown licenses on third-party packages

How Octokraft detects it: The compliance image runs Syft to generate a software bill of materials and flag problematic licenses, and Gitleaks to scan history for secret patterns. Dependency CVEs from Trivy are emitted as security issues, not compliance, because they represent active vulnerabilities rather than policy violations.Example: A developer commits an .env file containing a production database password. Even after the file is removed in a later commit, the secret remains in git history. Gitleaks surfaces it so it can be rotated.

Severity Levels

Every issue has a severity level that determines how heavily it impacts your health score. Severity carries a penalty multiplier:

Severity	Multiplier	Meaning
Critical	8.0x	Must fix immediately. Data loss, security breach, or system crash.
High	4.0x	Should fix before merge. Significant bugs or performance issues.
Medium	1.0x	Fix when possible. Code quality issues that accumulate over time.
Low	0.3x	Nice to fix. Minor improvements that incrementally improve the codebase.
Info	0.1x	Awareness only. Suggestions and observations, not problems.

The combination of category weight and severity multiplier determines the actual impact on your score. A critical security issue (2.0x category weight times 8.0x severity multiplier = 16.0x impact) affects your score far more than an info-level compliance suggestion (0.5x times 0.1x = 0.05x impact).

Grades

Your health score maps to a letter grade:

Grade	Score Range
A+	95 — 100
A	90 — 94
A-	85 — 89
B+	80 — 84
B	75 — 79
B-	70 — 74
C+	65 — 69
C	60 — 64
C-	55 — 59
D+	50 — 54
D	45 — 49
D-	40 — 44
F	Below 40

Most well-maintained codebases land in the B to A- range. A+ requires consistently low issue density across all categories. F indicates systemic problems that need immediate attention.

When Assessments Run

Initialization

When you first connect repositories to Octokraft, a full assessment runs across the entire codebase. This establishes your baseline scores.

On PR merge

When a pull request merges to your default branch, Octokraft runs an incremental assessment that analyzes only the changed files and recalculates scores. This keeps your dashboard current without re-analyzing the entire codebase.

Manual trigger

You can trigger a full reassessment at any time from the dashboard or via the API.

Drift Alerts

When your health score drops between assessments, Octokraft generates a drift alert. Each alert shows:

The score before and after the change
Which categories were affected
Which specific issues caused the decline
The pull request or event that triggered the change

Drift alerts catch gradual quality degradation — the kind where each individual PR looks fine in isolation, but the cumulative effect over weeks or months is a steadily declining codebase. By the time someone notices “the code feels worse,” dozens of small regressions have compounded. Drift alerts surface each one as it happens.

Dismiss Rules

Not every finding applies to every codebase. Some issues are intentional trade-offs, legacy code scheduled for removal, or patterns specific to your domain. Dismiss rules let you suppress specific findings without losing visibility.

Global Dismiss

Suppress a specific issue type everywhere in the project. Use this for patterns you have deliberately chosen to allow.

File-Scoped Dismiss

Suppress an issue only for a specific file. Use this for legacy code, generated files, or modules scheduled for replacement.

Dismissed issues are still detected and recorded for audit purposes. They simply do not count toward your health scores. You can review and reverse dismiss rules at any time.

Where Findings Come From

Issues on your dashboard come from 5 complementary sources, all merged into a unified view:

Source	What It Detects
Static analyzers	Language-specific linting rules, syntax issues, known anti-patterns
AI analysis	Semantic issues that require understanding intent — logic errors, missing edge cases, architectural concerns
Code graph	Structural problems detected from cross-file analysis — dead code, circular dependencies, coupling metrics
Convention deviations	Code that breaks patterns established elsewhere in your codebase
Architecture review	System-level design issues — modularity violations, scaling bottlenecks, pattern inconsistencies

You do not need to configure these sources individually. Octokraft runs all applicable analyzers automatically based on the languages detected in your repository.

What Gets Ignored

Standard non-source directories are automatically excluded from analysis:

node_modules/, .venv/, vendor/ — dependency directories
dist/, build/, target/, bin/ — build output
__pycache__/, .pytest_cache/ — runtime caches
coverage/ — test coverage output
.git/ — version control internals

These are build artifacts and third-party dependencies, not your code. Analyzing them would inflate issue counts and produce meaningless results. Octokraft focuses exclusively on code your team writes and maintains.

​Code Health

​How Scoring Works

​Size Normalization

​The Scoring Curve

​Overall Score

​The 8 Categories

​Security

​Runtime Risks

​Test Coverage

​Code Smells

​Duplication

​Dead Code

​Consistency

​Compliance

​Severity Levels

​Grades

​When Assessments Run

​Drift Alerts

​Dismiss Rules

Global Dismiss

File-Scoped Dismiss

​Where Findings Come From

​What Gets Ignored

Code Health

How Scoring Works

Size Normalization

The Scoring Curve

Overall Score

The 8 Categories

Security

Runtime Risks

Test Coverage

Code Smells

Duplication

Dead Code

Consistency

Compliance

Severity Levels

Grades

When Assessments Run

Drift Alerts

Dismiss Rules

Where Findings Come From

What Gets Ignored