How to test software with invalid and unexpected inputs

How to test software with invalid and unexpected inputs

Functional testing tells you whether your software does what it is supposed to do. It does not tell you what happens when it receives something it was never supposed to receive. For most software, that second question is left unanswered until a fault in the field or a security researcher finds the answer for you.

Testing with invalid and unexpected inputs is the discipline that closes that gap. It is not a replacement for functional testing. It is the complementary activity that explores the input space functional testing structurally cannot reach, and finds the vulnerabilities and failure modes that live there.

This guide explains how to generate and execute test cases that go beyond the specification, covering the two main approaches to invalid input generation, how to apply them to different types of interfaces and systems, and where automated fuzz testing fits as the scalable implementation of the same principles.

What Is Invalid Input Testing?

Invalid input testing is the practice of deliberately sending a system inputs that fall outside its specification to observe how it behaves. The inputs might be the wrong data type, the wrong length, outside the valid range, in the wrong sequence, structurally malformed, or simply values the system was never designed to process. The goal is not to verify correct behaviour. It is to find incorrect behaviour under conditions the specification did not plan for.

The term sits within the broader category of negative testing , which covers all testing that verifies system behaviour under invalid or unexpected conditions rather than verifying intended behaviour under valid conditions. Invalid input testing focuses specifically on the input layer: the fields, messages, packets, and data structures that the system processes, and what happens when their content violates the assumptions the developers built into the processing logic.

The failure modes that invalid input testing finds include memory safety violations triggered by inputs longer than expected buffers, parser crashes caused by malformed data structures, error handling failures that expose internal state in responses to invalid requests, authentication bypasses triggered by inputs that confuse the authentication logic, and denial-of-service conditions caused by inputs that trigger disproportionate resource consumption. These are not theoretical risks. They are the vulnerability classes found most consistently when testing is extended beyond the specification, and they are the classes that appear most regularly in real-world security incidents.


Mutation-Based vs Generation-Based Input Testing

Two complementary approaches exist for generating invalid and unexpected test cases: mutation-based and generation-based. Understanding the difference between them helps teams choose the right approach for their specific testing context and combine them effectively.

Mutation-based input generation starts from valid inputs and modifies them to produce invalid ones. A valid protocol message becomes an invalid test case by flipping a bit, changing a field value to something outside the valid range, truncating the message, extending it beyond the expected length, or substituting a prohibited value in a constrained field. The advantage of mutation-based generation is that it requires only valid input samples to get started and produces test cases that are structurally close to valid inputs, making it more likely that they pass initial framing checks and reach the application logic being tested.

The limitation of mutation-based generation is that it is bounded by the valid inputs used as seeds. It can only produce variations of what it starts from. If the seed inputs do not cover all message types, all state transitions, or all interface entry points, the mutation-based test cases will not cover them either. It is effective for finding vulnerabilities near the valid input space but less effective for finding vulnerabilities that require structurally unusual inputs to reach.

Generation-based input testing constructs test cases from a model of the input space rather than from existing valid inputs. A formal model of a protocol, a data format, or an interface specification describes the structure of valid inputs, and generation-based testing produces test cases that conform to that structure in some fields while violating it in others in systematic, targeted ways. Generation-based testing is more thorough than mutation-based testing because it can reach parts of the input space that no existing valid input sample represents, but it requires more upfront investment in building or obtaining the model.

In practice the two approaches are complementary. Mutation-based testing is faster to get started and effective for finding vulnerabilities close to the valid input space. Generation-based testing provides more systematic coverage and is necessary for finding vulnerabilities in less-trafficked areas of the input space. A testing programme that uses both approaches covers significantly more ground than one that relies on either alone.


Why Testing Beyond the Specification Matters

The specification defines what the software is supposed to do. It does not define what the software actually does when it receives something outside that specification. The gap between those two things is where most security vulnerabilities live.

Developers test against the specification. They write unit tests for the inputs the system is designed to handle. They test the happy path and the documented error paths. They test the boundary values they know about. What they do not systematically test is the space outside the specification, because the specification does not describe it and their testing tools are designed around the specification.

The inputs that cause security vulnerabilities are almost always outside the specification. A buffer overflow is triggered by an input longer than the developer assumed any input would be. A parser crash is triggered by a malformed structure the parser was not designed to handle. An authentication bypass is triggered by an input that puts the authentication logic into a state its designers did not anticipate. None of these inputs appear in a specification-driven test plan, which is why they are not found by specification-driven testing.

For systems that process external inputs, particularly those that communicate over networks or accept data from untrusted sources, this gap between specification-driven testing and the real input space represents a significant and largely unexplored attack surface. Extending testing beyond the specification is not a nice-to-have. It is the activity that finds the vulnerabilities that specification-driven testing structurally cannot find, and that attackers will find if developers do not find them first.


How to Generate Invalid and Unexpected Test Cases

Generating useful invalid and unexpected test cases requires a systematic approach. Random inputs are rarely effective because most random inputs fail early validation checks before reaching the logic where vulnerabilities sit. The goal is to generate inputs that are wrong in specific, targeted ways while being structurally plausible enough to reach the processing logic being tested.

Boundary value analysis is the starting point for most invalid input testing. For every field with a defined valid range, generate test cases at the minimum valid value, the maximum valid value, one below the minimum, one above the maximum, and values significantly outside the range including the maximum value for the data type. Boundaries are disproportionately likely to contain bugs because developers test the interior of ranges and neglect the edges. For string and binary fields, test at the maximum defined length, one byte beyond it, and at the maximum length the data type can represent.

Type violation testing sends the wrong data type for a field: a string where an integer is expected, a floating point value where an integer is expected, a negative value where an unsigned integer is expected, and a null value in a required field. Many parsers and validation routines handle type violations incorrectly, producing crashes or unexpected behaviour when the type assumption built into the processing logic is violated.

Format violation testing sends inputs that are structurally malformed in ways specific to the format being tested. For a length-prefixed binary protocol, send a message where the declared length is longer than the actual payload, shorter than it, zero, and the maximum value for the length field type. For a text-based format, send inputs with missing delimiters, extra delimiters, incorrect encoding, and prohibited character sequences. For a structured data format, send inputs with missing required fields, duplicate fields, and fields in an unexpected order.

Sequence testing generates inputs in sequences that violate the expected state machine of the protocol or interface. Send a response message before a request, send a message that is only valid in an authenticated session before authentication, send the same message twice in rapid succession, and send messages in orders that the specification marks as invalid. State machine bugs are found by reaching states the developers did not test, which requires testing sequences rather than individual messages.

Resource exhaustion testing sends inputs designed to trigger disproportionate resource consumption: extremely long inputs that require proportionally large allocations to process, inputs that trigger deeply recursive processing, and inputs that cause the system to open connections or allocate resources without closing or releasing them. These find the denial-of-service conditions that affect availability even when they do not produce exploitable memory safety issues.


Testing Protocol Implementations and Embedded Systems

Protocol implementations and embedded systems present specific challenges for invalid input testing that do not apply to web applications or desktop software. Understanding those challenges is necessary to design testing that actually reaches the vulnerabilities in these systems.

The framing challenge is the most significant. Network protocols use framing mechanisms, checksums, and validation rules at the transport and framing layer that reject structurally invalid inputs before they reach the application logic. A test case that sends random bytes to a Modbus endpoint is rejected by the CRC check. A test case that sends a malformed DLMS COSEM frame is rejected by the framing layer. To reach the application logic where most vulnerabilities sit, test cases need to be invalid in ways that pass the framing layer checks but violate the application layer assumptions.

This is the distinction between generic invalid input testing and protocol-aware invalid input testing. Generic testing can find vulnerabilities in the framing layer itself, but it cannot systematically find vulnerabilities in the application logic of a protocol implementation. Protocol-aware testing generates inputs that conform to the framing requirements while being invalid at the application layer, which is the approach necessary to test the processing logic that handles message content, field values, and command semantics.

For embedded systems that cannot run standard test frameworks, the testing approach needs to be external. Rather than injecting test cases through an in-process testing harness, the test cases are delivered over the network or communication interface the device exposes, and behaviour is observed from outside. This requires a test environment that can generate protocol-appropriate test cases, deliver them to the device under test, and observe the device’s responses and behaviour, including crashes and hangs that manifest as connection loss or non-responsiveness.

Legacy devices present additional challenges. Many OT and industrial devices were not designed with testability in mind. They may not provide detailed error responses that allow test cases to be correlated with specific failure modes. They may have communication interfaces that are sensitive to timing and volume of test traffic. Testing these devices requires an approach that controls the rate and volume of test delivery and uses observable external behaviour, including the presence or absence of responses, as the primary indicator of findings.


Where Invalid Input Testing Fits in the Development Lifecycle

Invalid input testing is most effective when it is integrated throughout the development lifecycle rather than conducted as a single activity before release. The earlier invalid input vulnerabilities are found, the cheaper they are to fix, and the less likely they are to reflect architectural decisions that are expensive to change.

At the unit testing stage, invalid input testing applies to individual functions and components that process external data. A parsing function should be unit tested not just with valid inputs but with inputs that violate the format it expects. A field validation function should be tested not just with values inside the valid range but with values at and beyond the boundaries. This is the cheapest point to find and fix the bugs that invalid input testing surfaces, because they are isolated to specific functions rather than arising from system-level interactions.

At the integration testing stage, invalid input testing applies to interfaces between components and to the interfaces the integrated system exposes. Integration often introduces new failure modes that component-level testing does not surface, because the system’s behaviour under invalid inputs depends on how components interact. A parser that handles malformed inputs correctly in isolation may cause a crash when its output is passed to a downstream component that does not expect the error state the parser returns.

Pre-release, system-level invalid input testing provides the evidence that security assurance and compliance requirements demand. For products subject to IEC 62443-4-1 Practice 6, this is the stage at which the documented vulnerability testing evidence needs to be produced. The testing methodology, scope, findings, and traceability to standard requirements all need to be captured at this stage in a form that will satisfy a certification audit.

Post-release, invalid input testing provides regression coverage as implementations change. New message types, updated parsing logic, and changes to underlying libraries can all introduce new vulnerabilities. Running the invalid input test corpus against each significant update catches regressions before they reach the field.


What Good Test Output Looks Like

The output of an invalid input testing programme determines whether the findings it produces are acted on. Poor output is one of the most common reasons invalid input testing fails to improve security posture: the testing finds real vulnerabilities but the output does not give the engineering team what it needs to reproduce, understand, and fix them.

Each finding needs the exact input that triggered it. Not a description of the input category, not a characterisation of the vulnerability class, but the exact bytes, field values, and sequence that caused the failure. Without this, the finding cannot be reproduced, the root cause cannot be identified, and the fix cannot be verified. For protocol testing, this means the exact message content including all field values, the protocol state at the time the message was sent, and the prior message sequence that established that state.

The observed behaviour needs to be documented with equal precision. A crash needs the memory address, the instruction that failed, and the input field that caused the overflow, not just a note that the target became unresponsive. An unexpected response needs the exact response content and an explanation of why it is unexpected relative to the specification. A resource exhaustion condition needs the specific input that triggered it, the resource type affected, and the magnitude of the consumption observed.

Severity classification needs to reflect exploitability in the specific deployment context. A crash on an unauthenticated interface is a different risk from the same crash on an interface that requires prior authentication. A finding that leaks memory addresses is a different risk from one that causes a clean restart. The classification needs to give the engineering team a clear prioritisation basis, not just a list of things that went wrong ordered by how bad the crash looked.


How ProtoCrawler Automates Invalid Input Testing for Protocols

ProtoCrawler is CyTAL’s automated protocol fuzz testing platform. It implements the invalid input testing principles described in this guide at the scale and depth that manual testing cannot achieve for protocol implementations.

For each supported protocol, ProtoCrawler uses a formal protocol model to generate test cases that combine both mutation-based and generation-based approaches. Mutation-based test cases modify valid protocol messages in targeted ways: flipping field values, extending lengths, truncating payloads, and substituting prohibited values. Generation-based test cases construct messages from the protocol model with specific, targeted violations at the application layer while conforming to framing requirements. Together they provide coverage of the input space that neither approach alone achieves.

The state-aware testing capability drives the protocol implementation through its state machine and generates targeted invalid inputs at each state. This is the capability that finds state machine bugs and the vulnerabilities that only manifest in specific protocol states, which manual testing and generic fuzz testing cannot reach systematically.

The output captures each finding with the precision that actionable results require: the exact triggering input, the protocol state at the time it was sent, the observed behaviour, and the severity classification based on exploitability in the deployment context. Every finding maps directly to IEC 62443 compliance requirements, producing audit-ready evidence for SVV-3 vulnerability testing, CR 3.5 input validation, and CR 7.1 denial-of-service protection.

ProtoCrawler supports more than 100 protocols including Modbus, DNP3, IEC 61850, IEC 60870-5-104, GTP-C, GTP-U, DLMS COSEM, MQTT, SS7, and Diameter. For the full protocol list, see the protocol models page

Ready to extend your test coverage beyond the specification? Book a demo to see how ProtoCrawler generates and executes invalid input test cases for the protocols your systems implement.


Common Questions About Invalid Input Testing

How is invalid input testing different from fuzz testing?

Fuzz testing is the automated, scalable implementation of invalid input testing. Manual invalid input testing applies the same principles but generates test cases by hand, which limits the volume and coverage achievable. Fuzz testing automates test case generation and execution, enabling systematic coverage of the input space at a scale that manual testing cannot match. For protocol implementations, automated fuzz testing is effectively required to achieve meaningful coverage because the input space is too large to explore manually.

Do I need source code access to test with invalid inputs?

No. Black-box testing, which sends inputs to the system and observes its external behaviour, does not require source code access. It is the appropriate approach for testing commercial devices, third-party components, and deployed systems where source code is not available. Source code access enables additional techniques such as coverage-guided fuzzing and white-box analysis, but it is not a prerequisite for finding security-relevant vulnerabilities through invalid input testing.

How do I prioritise which interfaces to test first?

Prioritise interfaces that accept inputs from untrusted sources, that process complex structured data, that implement security-critical functions such as authentication and authorisation, and that have the most significant operational consequences if they fail. For protocol implementations, all external protocol interfaces should be considered in scope. The interfaces most exposed to external inputs and most consequential in failure are the ones to test first.

How many test cases are needed for adequate coverage?

There is no universal answer, because adequate coverage depends on the complexity of the interface, the number of message types and states, and the depth of testing required. For simple interfaces, thousands of test cases may be sufficient. For complex protocol implementations with rich state machines, meaningful coverage requires millions. This is why manual invalid input testing is not sufficient for protocol implementations, and why automated fuzz testing platforms like ProtoCrawler are necessary to achieve the coverage that security assurance requires.

What is the difference between invalid input testing and robustness testing?

Robustness testing is the broader category that assesses how a system behaves under conditions it was not designed for, including invalid inputs, resource constraints, and operational stress. Invalid input testing focuses specifically on the input layer. All invalid input testing is robustness testing, but robustness testing also covers conditions beyond the input layer such as resource exhaustion, timing violations, and environmental stress that are not input-driven.

Ready to extend your test coverage to invalid and unexpected inputs? Book a demo or get in touch to discuss how ProtoCrawler fits into your testing programme.

Book a demo

This field is for validation purposes and should be left unchanged.

Book Your Free Demo

Complete the form and we will confirm your slot within 1 business day.

By submitting, you agree to Cytal storing your information to arrange this demo. We will never share your details with third parties. Privacy Policy. Unsubscribe at any time.