Skip to main content

Code Health

Octokraft continuously measures the health of your codebase and distills it into a single score. That score is not a vague sentiment — it is a precise, auditable number built from every issue detected across your repositories. This page explains exactly how that number is calculated, what the categories mean, and how to interpret the results on your dashboard.

How Scoring Works

Size Normalization

Scores are normalized by codebase size. Octokraft measures issues per thousand lines of code (KLOC), so a 100,000-line repository with 50 issues scores the same as a 10,000-line repository with 5 issues. This prevents large codebases from being unfairly penalized and makes scores comparable across projects of different sizes.

The Scoring Curve

Health scores use a sigmoid curve:
score = 100 / (1 + penalty / 2.0)
This has two important properties:
  • Resilience at the top. A single bad issue does not tank your score to zero. Scores decline gradually, then plateau. A codebase with one critical vulnerability is not equally bad as one with fifty.
  • Diminishing returns. Going from a score of 90 to 95 requires more effort than going from 50 to 55. The last few points reflect genuine excellence, not just the absence of obvious problems.

Overall Score

Your overall health score is a weighted average of all 8 category scores. Categories that represent higher risk to your system carry more weight.

The 8 Categories

Every issue detected in your codebase is classified into one of 8 categories. Each category has a weight that reflects its impact on system reliability, and a baseline threshold that defines the acceptable issue density.

Security

What it measures: Vulnerabilities that could compromise your system, expose user data, or allow unauthorized access.Why it is weighted highest: A single security vulnerability can compromise your entire system. No amount of clean code or good architecture matters if an attacker can bypass authentication or inject malicious queries. Critical security issues have zero tolerance in scoring — even one causes significant score impact.Issues in this category:
  • SQL injection and other injection attacks
  • Hardcoded credentials and secrets in source code
  • Missing input validation on user-facing endpoints
  • Insecure deserialization
  • Exposed API keys or tokens
  • Authentication and authorization bypasses
  • Cross-site scripting (XSS) vectors
Baseline thresholds: Zero tolerance for critical issues (0.0 per KLOC). Very strict for high-severity issues (0.05 per KLOC).Example: A database query that interpolates user input directly into a SQL string instead of using parameterized queries is flagged as a critical security issue.

Runtime Risks

What it measures: Code patterns that can crash your application or cause failures in production.Why it is weighted high: Runtime issues are the difference between software that works and software that fails under real conditions. Unlike code smells which slow development, runtime risks cause outages.Issues in this category:
  • Null pointer dereferences and unhandled nil access
  • Unhandled exceptions and missing error propagation
  • Resource leaks — unclosed database connections, file handles, network sockets
  • Race conditions in concurrent code
  • Type coercion errors that fail silently
  • Infinite loops and unbounded recursion
  • Memory leaks from retained references
Example: A function opens a database connection, performs a query, but does not close the connection in the error path. Under sustained load, the connection pool is exhausted and the service stops responding.

Test Coverage

What it measures: Whether your tests actually verify the behavior of your code — not just whether tests exist.Why it is weighted high: Untested code breaks in production without warning. Tests are your safety net for every future change. A codebase without meaningful test coverage cannot be changed with confidence.What Octokraft evaluates:
  • Test presence per module — do modules that contain business logic also contain tests?
  • Structural coverage — what percentage of exported functions are exercised by test callers?
  • Assertion density — how many assertions per test function? A test with no assertions verifies nothing.
  • Mock usage patterns — are tests isolated, or do they depend on external services?
Important distinction: Octokraft does not simply measure line coverage. A test file that calls functions but never asserts on their output provides false confidence. Octokraft measures whether your tests actually verify behavior.Example: A module exports 12 public functions. Tests exist, but they only exercise 3 of those functions. The other 9 have zero test callers and would break silently on any change.

Code Smells

What it measures: Anti-patterns that slow development and increase the probability of bugs.Why it is baseline weight: Code smells do not directly cause production failures, but they make your codebase harder to understand, harder to change, and more likely to accumulate bugs over time. They represent the chronic condition of technical debt.Issues in this category:
  • God classes — classes with too many methods or responsibilities
  • Deeply nested conditionals that are difficult to reason about
  • Overly complex functions (high cyclomatic complexity)
  • Long parameter lists that indicate poor abstraction
  • Feature envy — functions that use more data from another module than their own
  • Primitive obsession — using raw strings and integers where domain types should exist
Example: A single controller class handles user authentication, payment processing, email sending, and report generation. Any change to one concern risks breaking another.

Duplication

What it measures: Repeated code that should be consolidated.Why it is weighted lower: Duplicated code is technical debt, but it does not directly cause bugs. The risk is indirect: when a bug is fixed in one copy but not the other, or when behavior diverges between copies over time.Issues in this category:
  • Copy-paste code blocks across files
  • Similar logic repeated with minor variations
  • Redundant implementations of the same algorithm
  • Duplicated validation rules across endpoints
Example: The same date-parsing logic appears in three different services. A timezone bug is fixed in one but not the other two, causing inconsistent behavior across the application.

Dead Code

What it measures: Code that exists but is never executed.Why it is weighted lower: Dead code does not cause bugs — it causes confusion. Developers waste time reading, maintaining, and working around code that serves no purpose. It inflates the codebase and creates misleading search results.How Octokraft detects it: Octokraft builds a call graph across your codebase. If an exported function has no callers and no references anywhere in the project, it is flagged as dead code. Internal or private functions are not flagged, since they may be intended for near-term use.Issues in this category:
  • Functions that are never called from anywhere in the codebase
  • Variables assigned but never read
  • Unreachable code paths after early returns or throws
  • Unused imports and dependencies
  • Feature-flagged code where the flag has been permanently disabled
Example: A utility function formatCurrency() was replaced by a library call six months ago. The original function still exists, still shows up in search results, and new developers occasionally call it instead of the library version.

Consistency

What it measures: Whether your codebase follows its own established patterns.Why it is weighted lower: Inconsistency rarely causes bugs directly, but it significantly slows onboarding, increases cognitive load, and makes code review harder. When every file handles errors differently, developers cannot build reliable intuitions about how the codebase behaves.Issues in this category:
  • Mixed naming conventions (camelCase in some files, snake_case in others)
  • Inconsistent error handling patterns across modules
  • Different approaches to the same problem in different parts of the codebase
  • Naming patterns that deviate from established project conventions
How this connects to Conventions: Issues in this category are often generated by comparing code against the conventions Octokraft has detected in your codebase (see Architecture Intelligence). If 90% of your functions follow one naming convention, the other 10% are flagged here.Example: Most of the codebase returns errors using a Result type, but three modules use thrown exceptions instead. A developer working across modules has to remember two different error handling paradigms.

Compliance

What it measures: Adherence to framework best practices and coding standards.Why it is weighted lowest: Standards violations are the least likely to cause production issues. They represent deviations from recommended practices rather than concrete risks. However, they still matter for long-term maintainability.Issues in this category:
  • Framework best practice violations
  • Coding standard deviations
  • Missing or incomplete documentation on public APIs
  • Deprecated API usage
Example: A project uses a web framework that recommends middleware-based authentication, but several endpoints implement auth checks inline. The code works, but it diverges from the framework’s intended patterns and is harder to audit.

Severity Levels

Every issue has a severity level that determines how heavily it impacts your health score. Severity carries a penalty multiplier:
SeverityMultiplierMeaning
Critical5.0xMust fix immediately. Data loss, security breach, or system crash.
High3.0xShould fix before merge. Significant bugs or performance issues.
Medium1.5xFix when possible. Code quality issues that accumulate over time.
Low1.0xNice to fix. Minor improvements that incrementally improve the codebase.
Info0.3xAwareness only. Suggestions and observations, not problems.
The combination of category weight and severity multiplier determines the actual impact on your score. A critical security issue (2.0x category weight times 5.0x severity multiplier = 10.0x impact) affects your score far more than an info-level compliance suggestion (0.5x times 0.3x = 0.15x impact).

Grades

Your health score maps to a letter grade:
GradeScore Range
A+95 — 100
A90 — 94
A-85 — 89
B+80 — 84
B75 — 79
B-70 — 74
C+65 — 69
C60 — 64
C-55 — 59
D+50 — 54
D45 — 49
D-40 — 44
FBelow 40
Most well-maintained codebases land in the B to A- range. A+ requires consistently low issue density across all categories. F indicates systemic problems that need immediate attention.

When Assessments Run

1

Initialization

When you first connect repositories to Octokraft, a full assessment runs across the entire codebase. This establishes your baseline scores.
2

On PR merge

When a pull request merges to your default branch, Octokraft runs an incremental assessment that analyzes only the changed files and recalculates scores. This keeps your dashboard current without re-analyzing the entire codebase.
3

Manual trigger

You can trigger a full reassessment at any time from the dashboard or via the API.

Drift Alerts

When your health score drops between assessments, Octokraft generates a drift alert. Each alert shows:
  • The score before and after the change
  • Which categories were affected
  • Which specific issues caused the decline
  • The pull request or event that triggered the change
Drift alerts catch gradual quality degradation — the kind where each individual PR looks fine in isolation, but the cumulative effect over weeks or months is a steadily declining codebase. By the time someone notices “the code feels worse,” dozens of small regressions have compounded. Drift alerts surface each one as it happens.

Dismiss Rules

Not every finding applies to every codebase. Some issues are intentional trade-offs, legacy code scheduled for removal, or patterns specific to your domain. Dismiss rules let you suppress specific findings without losing visibility.

Global Dismiss

Suppress a specific issue type everywhere in the project. Use this for patterns you have deliberately chosen to allow.

File-Scoped Dismiss

Suppress an issue only for a specific file. Use this for legacy code, generated files, or modules scheduled for replacement.
Dismissed issues are still detected and recorded for audit purposes. They simply do not count toward your health scores. You can review and reverse dismiss rules at any time.

Where Findings Come From

Issues on your dashboard come from 5 complementary sources, all merged into a unified view:
SourceWhat It Detects
Static analyzersLanguage-specific linting rules, syntax issues, known anti-patterns
AI analysisSemantic issues that require understanding intent — logic errors, missing edge cases, architectural concerns
Code graphStructural problems detected from cross-file analysis — dead code, circular dependencies, coupling metrics
Convention deviationsCode that breaks patterns established elsewhere in your codebase
Architecture reviewSystem-level design issues — modularity violations, scaling bottlenecks, pattern inconsistencies
You do not need to configure these sources individually. Octokraft runs all applicable analyzers automatically based on the languages detected in your repository.

What Gets Ignored

Standard non-source directories are automatically excluded from analysis:
  • node_modules/, .venv/, vendor/ — dependency directories
  • dist/, build/, target/, bin/ — build output
  • __pycache__/, .pytest_cache/ — runtime caches
  • coverage/ — test coverage output
  • .git/ — version control internals
These are build artifacts and third-party dependencies, not your code. Analyzing them would inflate issue counts and produce meaningless results. Octokraft focuses exclusively on code your team writes and maintains.