A Test and Verification Strategy

Summary

A discussion of how to organize testing to minimize the overall time. For medical devices there are usually the following types of tests that need to be done:

unit testing
automated system testing
manual system testing
session based testing
requirements testing
user testing

Observations

top-level

The tests can interact with each other.
- detecting a bug at one level, can cause a cascade effect. Those later tests will need to be redone.
- a test can leave the system in a new state, not a default out-of-the-box state
Testing is probabilistic.
- A test passing at a lower level tends (not guarantees) success of that behavior in later tests. e.g. a passing unit test usually means that the system calls to that function will likely be successful.
- A test failing at one level increases the odds it will appear at higher level testing. e.g. a failing unit test may likely show up as a failure in requirements testing
- Overall a passing test reduces the probability of a failure in the field, but it does not necessarily make it 0% odds of it occurring.
- The probabilities are cumulative. Since the different testing processes do their work in different ways and different perspectives, they double-check each other's test results. For example, a behavior that is passing 10 unit tests, a couple system tests, a few manual tests and finally by 1 requirement test, makes the odds of that failing in the field lower than if just 10 UTs were done.
Some tests are more important than others. Some behaviors are critical to the overall behavior of the system. Other behaviors are more local to specific parts.
Behaviors can be grouped. Some behaviors are global, but many are a specific to how user interacts. For example, the idea there is a "happy path" indicates there is a bell curve to which paths a user more commonly takes versus ones he/she rarely takes.

unit testing

Since these are automated, they run at very quickly and can be run in parallel so can be even faster. There can be thousands of them and will typically cover the entire code base.

However, they don't tend to focus on the system-level behaviors, only the functional or module level behaviors. They are more difficult to do for hardware or mechanical behaviors.

automated system testing

It may be possible for the particular device to do automated system level testing. In short, these use a "simulated user" to drive the UI e.g. button presses are simulated, the screen is captured and confirmed for correct behaviors and any communication to the hardware or other destinations is captured and confirmed as well.

These tend to take much longer since the system as a whole has to react to any test stimulus and all those may have delays.

These will likely require some abstraction layer (or mocks) to simulate hardware behaviors and communication responses.

manual system testing

These do not have any simulated components. The manual tester performs actions that a typical user would do, and those actions work using the real hardware and other components in the system.

The setup and teardown times are usually longer than for unit testing or automated system testing. For example if a treatment takes 4 hours, then that single test will take 4 hours plus the setup and teardown times.

Also, there is the human factor. Manual testers can observe other aspects of the system behavior that an automated system test cannot do. So there is inherent distraction events in manual testing. Some of these are desired (they found bugs) and some are not.

session based testing

This uses manual testers to focus specific portions of the system behaviors e.g. a specific therapy delivery type, etc. The focus is on causing failures. They do actions that are not reasonable, not expected, and not likely to be done by a competent user. This testing finds rare, one-off bugs that can happen in the field.

These take a long time. But the bugs that are found are not going to be covered by any of the other testing. On the other hand, they can occur in the field by accident or by innocent incompetence. And they can catastrophic when they happen in the field.

requirements testing

These usually do not have any simulated components. There may be some test that require controlled changes to the system's environment that cause specific conditions needed by the specific requirement.

These are typically manual. But there can be automated portions of it as long as the automated test system can be verified to be used for those tests.

Since these are "formal" any failures may require substantial effort to be recorded, analysed, fixed and re-verified. The time impact of these can be significant to the overall project time since FDA submission can be attempted until these all requirements are verified/addressed.

user testing

These are usually done last. The tests are performed by users e.g. Nurses, Doctors, patients, etc. These tests confirm that the user interface design is adequate and "simple enough" that the device can be used without user confusion, failure, hesitation, etc.

They, again are "formal" and any failures may require substantial effort to be addressed adequately before an FDA submission can be done.

What can we do?

Phases

A fairly clear strategy is to split the tests into multiple phases. These phases feed into the next phase of the same type of testing and into the first phase on the next type of testing. For example:

Phase1 UTs that passes, triggers Phase2 of UTs and Phase1 of automated system tests
Phase2 automated system tests wait for Phase2 UTs to pass and Phase1 automated system tests.
This part can be adjusted.
- If the Phase1 and Phase2 are very fast (say a few hours) then they could be simply be in parallel. Of course, the necessary equipment has to be available.
- If the UTs by themselves are very fast, then do all UTs and that triggers Phase1 automated tests
- other strategies depend on specifics of the project and device/app being tested.
If Phase1 & Phase2 of UTs & automated tests pass, then it ensures that most (all?) of the manual tests are highly likely to pass.
- Manual testing can have a happy path Phase1 with for example the top 5 or so paths that 90% of users will do.
- If that passes then the effort behind session based testing is worth it.
- note that the number of testers available is big constraint. So using them efficiently is a key strategy concern.
- Also manual testers are human.
  - Some folks are better at some activities than others. Keep track of what they want/like to do and help them out.
  - They will become bored. Swap them around periodically to keep them interested and vested in their efforts. Talk to them!

Is there a Phase3?

It depends on the project.

If there are some behaviors that take a very long time to test, it will likely make sense to ensure all the other test types have shown the system is good shape before investing in the extra time.
If a failure in a Phase causes substantial risk to the project deadlines then it makes sense to put test those after all the testing is okay.
And since manual testing normally takes much longer, then it does make

How to split up the Phases

All that leads to the question: "What goes into Phase1 vs Phase2?"

Since tests are probabilistic and they can be grouped, we can use that to carefully choose which ones belong to which phase. But choosing the groups becomes critical. A good, rough start is:

choose happy-path behaviors. The happy-path allows us to navigate to any/all other behaviors much more easily. These typically belong in Phase1.
critical behaviors need to work. If they fail, you can never submit to the FDA. But:
- if the behavior is easy/fast to test, great do that in Phase1.
- otherwise put it in Phase2

Why do all this?

That may seem like a lot of work to do. So what is the rationale for doing all that effort? Is it worth all that energy and time?

In short, yes.

lower project risks

The sooner information is known, the quicker any reaction can be done. A shorter feedback cycle means the individual teams can react to that information sooner and also use it to pass information on to other teams.

track hiccups

The continual flow of information is useful to discover when something/anything goes off the rails. In a well functioning project, there are few, if any, surprises.

When one does happen it is very much the worth the time to do a risk analysis on it.

what is the root cause of the failure?
what is the probability it will happen again?
what are the mitigations, if any, that can be done?
what are the risks in not fixing the root cause?
what are the risks in fixing the root cause?

Over time, project becomes more stable as these issues get fixed.

coordination between teams

The more discussion and coordination between teams, the lower the risks for the overall project. Why? Because people know things. If they let others know about them, then risks can be fixed/mitigated before they occur.

definition of success

The definition of success (DOS) is to have a well-known, well-expected end date for the project and based on that, a successful submission and approval by the FDA.

Any "surprise!" issues that occur at the end of the project have a bigger impact on that end date then the extra effort spent fixing/mitigating those issues earlier in the project.

Why? Because fixing a problem earlier may also fix other potential side effects of that problem. Fixing it earlier, is simply better.