Aren Leishman: DiveCAN

Introduction

Closed-circuit rebreathers (CCRs) are life-support systems that recycle a diver’s breathing gas, scrubbing carbon dioxide and maintaining a precise partial pressure of oxygen (PPO2). The electronics that monitor and control these systems are therefore critical to the safe operation of these systems; accurate PPO2 monitoring is essential for avoiding both hypoxia and oxygen toxicity, either of which can be rapidly fatal underwater. Furthermore, Electronic Closed-circuit rebreathers (eCCRs) automatically inject oxygen based on the measured PPO2, removing the diver from the core control loop (though manual controls are still retained for system resiliency).

Shearwater Research has established their DiveCAN bus protocol as the de facto standard for communication between rebreather components: controllers, handsets, heads-up displays, and oxygen monitoring heads. The protocol enables interoperability between systems; a JJ rebreather can use a Shearwater NERD display instead of the default, and handsets can be upgraded over time.

However, Shearwater does not publish the DiveCAN specification, nor do they offer any pathway for hobbyists or small manufacturers to develop compatible hardware. The protocol remains proprietary, despite early discussion of it being an open standard.

The DiveCAN project addresses this gap through reverse engineering and open-source development. Over the past two years, the DiveCAN protocol has been reverse engineered thanks to the efforts of a number of contributors, and I have developed an open-source controller platform that enables divers to build, maintain, and customise their own rebreather electronics. All hardware designs, firmware, and documentation are released under permissive open-source licences.

The project repositories are hosted on GitHub under the QuickRecon organisation:

DiveCAN - Protocol documentation and analysis tools
DiveCANHead - Open-source controller hardware and firmware
DiveCANHud - DiveCAN HUD (in progress)

Protocol Information

For the full protocol documentation visit: https://github.com/QuickRecon/DiveCAN

Physical Layer

The DiveCAN bus is built upon the Controller Area Network (CAN) standard, operating at 125 kbps. This relatively low baud rate allows for a very resilient system even in the case of imperfect connections and terminations. The physical layer deviates from the CAN specification in a few key respects. Most critically is that it uses a 560Ω termination resistance instead of the more standard 120Ω termination, this is an effort to reduce the power consumption of the system. Rebreather electronics are battery powered and when the microcontroller is optimised the CAN traffic actually makes up a significant portion of the energy usage for the overall system.

This battery powered nature is also why the system uses low voltage CAN, which is a pseudostandard mainly provisioned by Texas Instruments which uses a 3.3V source voltage for the transceiver while retaining compatibility with the conventionally 5v powered bus. This is because 3.3V is high enough to establish a dominant state on the CAN High line. The linked application note from TI has more information about the nature of this implementation.

The physical connector used is a AK Industries Litebos connector in a 5 pin layout.

Power supply voltage varies, generally 9 volts is available on control buses (as a 9v battery is common for solenoid power) but sometimes the monitor bus will use a 3.6V lithium primary cell. The DiveCAN Handsets typically output 3.3V on the power line to provide a backup for if the main head power fails. This handset power can only provide about 16mA before it is cut off by overcurrent protection, so care is needed to ensure the protections aren’t tripped if running off this backup power source.

Message Format

DiveCAN uses extended CAN identifiers (29-bit) exclusively, with all messages residing in the 0x0Dxxxxxx address block. The identifier is structured as follows:

Bits 31-27: 0x0D (channel identifier)
Bits 26-19: Message type (0x00-0xCC)
Bits 18-11: Parameter or destination ID
Bits 10-0:  Source device ID

This structure enables efficient message filtering whilst avoiding arbitration conflicts between different message types. Unlike CANOpen or other higher-level protocols, DiveCAN does not implement a secondary abstraction layer; messages contain raw data with interpretation determined by the message type field. The exception to this message type 0x0A, which is a Universal Diagnostic Services communication running over ISO-TP, documentation of this aspect of the interface is ongoing.

Device Identification

Each device on the bus is assigned a unique identifier:

ID	Device
1	Shearwater controller
2	JJ OBOE, ISC Pathfinder head
3	JJ HUD
4	JJ SOLO, Optima head
5	rEvo Battery Box

The protocol supports up to nine devices per bus, though typical installations use fewer.

Core Message Types

The reverse engineering effort has documented over twenty-five distinct message types. The most critical for PPO2 monitoring are:

PPO2 Data (0x04): Transmitted at regular intervals, this message contains the partial pressure of oxygen for each of the three oxygen cells. Values are encoded as 8-bit integers where 1.32 PPO2 corresponds to 0x84. A value of 0xFF results in a “FAIL” display on the handset.

Millivolt Readings (0x11): Contains the raw cell voltages as big-endian 16-bit integers, enabling the handset to display diagnostic information to the diver.

Cell Status (0xCA): A two-byte message containing a 3-bit status mask indicating which cells are voting, plus the consensus PPO2 value.

These three messages form a picture of the oxygen monitoring state, sent from the controller head to the handset for display.

Calibration Messages: The calibration protocol uses a request-response pattern:

Calibration Request (0x13): Initiated by the handset, includes the fraction of oxygen (FO2) and atmospheric pressure
Calibration Response (0x12): Returns a status code indicating success (0x01), acknowledgement (0x05), rejection (0x08), low battery (0x10), or solenoid error (0x18)

Additional message types support temperature monitoring, CO2 sensing, setpoint configuration, tank pressure integration, and a menu system for configuration. The complete specification is maintained in the DiveCAN repository.

Architecture Philosophy

A key insight from the reverse engineering is that DiveCAN centralises all intelligence in the controller head. The handset functions purely as a display device, rendering whatever data it receives without validation or cross-checking. For example, millivolts and reported PPO2 are not cross-compared, nor will the handset alarm/vote out an outlier cell unless it is marked as such in the cell status message. This design simplifies the handset firmware but places significant responsibility on the controller to ensure data integrity.

The controller is responsible for:

Reading and conditioning oxygen cell signals
Performing cell voting and consensus calculations
Detecting and flagging error states
Managing calibration sequences
Controlling the solenoid for oxygen injection (in active systems)

The original motivation for this project was to enable the use of DiveO2 solid state oxygen sensors without having to do a conversion to an analog signal to be compatible with the more common analog handsets available to homebuilders. This has been highly successful as having a digital integration over DiveCAN allows for the exact reported PPO2 from the cell to be displayed, and error states such as excess humidity within the cell or a detected flaw with the measurement can be reported. This is in contrast to the ‘convincing lies’ so commonly associated with galvanic oxygen cell failure.

Hardware Platform

DiveCANHead Controller

Early AVR based prototype being tested on the bench

The DiveCANHead is an open-source hardware platform designed to implement a reliable PPO2 monitor for CCR diving systems with full DiveCAN compatibility. The current hardware revision is 2.5 and is being used in a number of different rebreathers, ranging from MK15.5s and AP Classic Inspirations through to FlexCCRs.

Processor and Core Architecture

The controller is built around an STM32L4 series microcontroller. This processor family was selected for its combination of low power consumption, computational capability for PID control algorithms, and extensive peripheral integration. Originally selected for its 3 UART interfaces (to allow for 3 DiveO2 sensors) its proven to be flexible and adaptable.

Power System

The dual-rail power architecture provides critical isolation between always-on and switchable subsystems:

VCC Rail: Powers the processor and essential functions; always active
VBUS Rail: Powers peripherals; controllable via firmware for power management

Input power can be sourced from either the CAN bus or an onboard battery. The system automatically prioritises CAN bus power when available, falling back to battery during disconnection or bus failures. While seemingly like a good idea in theory, this has caused excess power drain from handsets so in future hardware revisions this will be changing to draw from the battery as priority and switch to CAN if unavaliable. Firmware control of the VCC power rail will also be added.

Oxygen Sensing

The controller supports both analog galvanic cells and digital oxygen sensors:

Analog Interface: Two ADS1115 external ADCs provide differential input channels for reading the millivolt-level signals from traditional oxygen cells. Differential measurement eliminates common-mode noise that can be problematic in underwater environments.

Digital Interface: Three UART ports accommodate digital sensors such as the DiveO2 or Oxygen Scientific units. These sensors perform internal signal conditioning and linearisation, communicating calibrated PPO2 values directly.

The firmware implements flexible cell assignment, allowing any combination of analog and digital sensors to be configured via the menu system.

Solenoid Control

For active PPO2 control (as opposed to manual addition), the controller includes a solenoid driver based on the LMR62014 boost converter. Key specifications:

Output voltage: 12V or 6V (jumper-selectable)
Switching current: 1.3A maximum
PWM control for proportional injection

The solenoid timing is managed by a dedicated FreeRTOS task to ensure consistent oxygen delivery regardless of other system activity. There have also been alternative PPO2 control loop implementations used, one of my favorites is based on the MK15.5 analog control electronics, which simply does a small injection periodically, so long as the measured PPO2 is below the setpoint. This causes a small “nudge” in the direction of the setpoint, and stops excessive injection when making large temporary changes in depth. The compromise being it requires diving practice much closer to that of a manual rebreather, where the diver uses manual addition to establish the setpoint and manage large depth changes, the electronics just minimize the task loading during constant depth phases of the dive and provides a safety net.

Additional Features

SD Card Logging: All sensor readings and system events are logged for post-dive analysis
Hardware Watchdog: Independent watchdog timer ensures system recovery from firmware faults
Battery Monitoring: ADC channel dedicated to tracking battery state

Junior

A downscaled PCB referred to as the DiveCAN Controller Jr was also made, this strips away the UART interfaces, extra ADC inputs, analog outputs and complex power management to create a board with just a CAN interface, 3 analog cell inputs, and a solenoid output. This created a much smaller and cheaper PCB that can address a very large section of use cases. Designed to fit neatly onto the back of a 9v battery it is optimised for easy integration.

Firmware Architecture

Real-Time Operating System

The firmware is built on FreeRTOS, providing deterministic task scheduling essential for safety-critical applications. The task architecture separates concerns cleanly:

Task	Period	Function
CANTask	Event-driven	Process incoming DiveCAN messages
PPO2ControllerTask	100ms	PID loop for solenoid regulation
SolenoidFireTask	Event-driven	Execute solenoid timing sequences
PPO2TransmitterTask	200ms	Broadcast PPO2 data to dive computers
OxygenCellTask (×3)	100ms	Read individual cell values
WatchdogTask	500ms	Refresh hardware watchdog

This architecture ensures that critical functions-particularly PPO2 transmission and watchdog refresh-execute reliably even under heavy processing load.

Module Organisation

The firmware source is organised into logical modules:

Core/Src/
├── DiveCAN/
│   ├── DiveCAN.c         # Main message handling
│   ├── Transceiver.c     # Low-level CAN TX/RX
│   ├── PPO2Transmitter.c # Periodic broadcast
│   └── menu.c            # Configuration menu system
├── Sensors/
│   ├── OxygenCell.c      # Cell voting logic
│   ├── AnalogOxygen.c    # Galvanic cell driver
│   ├── DiveO2.c          # Digital sensor UART
│   └── OxygenScientific.c # O2S sensor driver
├── PPO2Control/
│   └── PPO2Control.c     # PID solenoid controller
└── Hardware/
    ├── ext_adc.c         # External ADC interface
    ├── solenoid.c        # Boost converter control
    ├── pwr_management.c  # Power rail management
    ├── flash.c           # EEPROM emulation
    └── log.c             # SD card logging

Testing Infrastructure

Static Analysis

Because reliability is paramount and I am working mainly as just a single developer, significant work has gone into finding as much as possible at compile time. This is is achieved through a number of layers, this first of these is a very verbose set of GCC flags:

WARN_FLAGS = -Wall -Wwrite-strings -Wextra
 -Wduplicated-cond -Wlogical-op -Wrestrict  
 -Wnull-dereference -Wcast-align -Wsequence-point 
 -Wcast-qual -Wuninitialized -Wchar-subscripts 
 -Wstringop-truncation -Wunused-parameter 
 -Wunused-value -Wjump-misses-init  -Wshadow 
 -Wswitch -Wundef -Wswitch-default  -Wstrict-prototypes 
 -Wtrigraphs -Wformat -Wformat-security -Wimplicit 
 -Wcomment -Wunreachable-code -Wreturn-type -Wno-unused-parameter 
 -Werror=vla -D_FORTIFY_SOURCE=3 -D__STDC_FORMAT_MACROS 
 -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -D_GLIBCXX_ASSERTIONS
 -fanalyzer
 -pedantic
 -fdump-analyzer-callgraph -fdump-analyzer-supergraph 
 -ftrivial-auto-var-init=pattern -fstack-protector-strong 
 -fstack-clash-protection -fharden-compares -fharden-conditional-branches
 -Wstack-usage=1305 -Wstack-protector -Wunsuffixed-float-constants

There is quite a bit going on there, and can be summarized briefly as “-Wall doesn’t actually turn on all the warnings”. GCC has remarkably potent static analysis built in, which does make sense given a large part of its job is optimising code based its behavior. The pile of extra warnings take advantage of that to detect null dereferences, impossible conditionals, and other badness.

This also let me inflict some of my own opinions on C onto the codebase, such as ensuring any use of Variable Length Arrays (vla’s) results in an error. The observant might also note that I set -Wunused-parameter and -Wno-unused-parameter, this is because while I want to have the warning enabled eventually, at this stage there is a large development effort that needs to go into eliminating unused parameters, as they are used in a lot of places relating to interrupt handlers and RTOS tasks, where function signatures have to be matched.

Probably the most overlooked feature in GCC is -fanalyzer, which is actually a full static analysis subsystem which can catch double frees, double closes, use after frees, and a bunch of the other classic gotchas of low level programming. One of the design rules for this project is to not use the heap, so a lot of it isn’t relevant, but it will catch null arguments, null dereferencing, stale stack pointers, and all sorts of other badness.

The other fdump flags gets GCC to dump out a large amount of its internal state, which is then used by a python script to approximate the worst case stack sizes needed for each task. This, when combined with debuggers to analyse the high stack watermarks in FreeRTOS, informs the stack size allocated to each task. I’m also using stack protection to ensure that if any code slips above my maximum expected stack usage, it will not compile, prompting a review of why the stack usage has become so large.

As a final layer of static analysis on top of what is already covered, the project also uses SonarQube to catch code smells and other sources of technical debt.

Unit Testing

The firmware includes unit testing test suite built on the CppUTest framework. Mock implementations of FreeRTOS and HAL functions enable testing of application logic in isolation:

PPO2 control algorithms
Cell voting and consensus
Configuration management
Message parsing and generation

Tests are executed on the development host, catching logic errors before hardware deployment. The main role of unit tests in this context is to allow for the debugging in-place of complex application logic, for example the ISO-TP/UDS implementations, the configuration processing, and the voting logic.

Hardware Integration Testing

A Python-based test harness enables automated testing against real hardware. The test stand comprises:

Arduino Due running a hardware shim simulating cells and bus behavior
CAN interface for message injection and monitoring
Programmable power supplies for supply voltage testing across both CAN power and battery power.

Test scenarios cover:

Calibration sequences across voltage ranges
PPO2 control loop response
Power mode transitions
Menu system navigation
Error condition handling

Test stand showing a direct connection to headers on the board

The technical implementation centers around a pytest core that sequences everything else around it. One of the core tenets of the board is that it should be highly configurable, the problem with this is it results in a large configuration space that must be tested across. In the name of robustness, the pytest framework enumerates the valid configurations and tests each configuration with each test, in the case of the PPO2 and millivolt tests, which also test a set of values, this can result in thousands of tests.

This poses a compromise between robustness and testing frequency, as currently the full suite of hardware tests takes many hours to run, a problem that will only get worse as the configuration possibilities expand. A way this could be addressed is to acknowledge that a vast majority of configurations share the same code paths, and focus on testing the outliers rather than robustly testing each configuration. However this results in the integration testing framework needing to make assumptions about the state of the tested devices implementation, so this is currently a step not taken.

Aside from the issue of dealing with the large testing space, the pytest framework also spools up a set of objects used by the tests to interface with the real world. Most critical is the divecan_client object, which is how the python tests read the can messages from the board, which is one of the most critical outputs. This effectively just abstracts a pycan device using a USB to CAN adaptor to pull the expected message off of the queue.

The next object that gets created is the HWShim, which is the internal name for the Arduino due that simulates inputs to the board. The Due was chosen for its 3 UARTs and number of PWM pins. The UARTs allow us to simulate digital cells and by applying a low pass filter onto the PWM pins, a simple ADC is created that allows for the simulation of the output of galvanic oxygen cells. There have been a few weakness identified in this approach though, while the STM32 used in the head has native support for the 1 wire UART interface used by Oxygen Scientific Greenflash cells, the same can not be said for the Due, which at this stage is unable to effectively simulate these cells without external hardware being added and removed for each test to tie the RX and TX lines together. The PWM ADC is also not without its problems, the slow response time of the RC circuit it forms means that an already slow testing process takes even longer, one has to wait for the ADC to slew to the new value before a reliable reading can be taken.

Lastly the python tests also spool up a psu object, which abstracts the two power supplies used for the boards so that the boards can be tested for accuracy in power monitoring, and also at the high and low limits of their input ranges, to ensure safe operation even when running right to (and a bit beyond) the specified limits.

Test stand showing a connection to the board via a bed of nails testing board

With these interfaces in concert it is possible to validate almost all of the externally observable behaviors of the board. Everything from configuration and calibration through to the accuracy of the ADC measurements and PPO2 voting logic is tested. At this stage the only functional behavior not tested is the solenoid output. This is because, particularly with the stateful PID loop, it is difficult to conceive of a test which is reflective of what the board will actually see in the wild, there has been some effort towards a rebreather software simulator, so that the boards responses to different oxygen injection rates, time constants, and breathing/depth change rates can be examined on the bench, but at this stage these efforts are incomplete.

Actually going for a dive

At the end of the day, all of the bench testing in the world can only get you so far, and the final stage of testing this kind of hardware is hooking it up to a rebreather and going for a dive. For this I have a dedicated AP Inspiration Classic which has been modified with all of the original electronics removed to make room for the DiveCAN board, as well as a backup analog cell monitor.

Inspiration CCR Head with DiveCAN board installed

The Inspiration Classic was chosen as a testbed unit because they’re widely available, cheap to obtain, and have a very roomy head, so it is possible to install hardware with quick change connections and other functionality to make development easier.

Taking the system swimming allows for testing against a whole range of conditions that are hard or impossible to accurately simulate on the lab bench. A notable case occurred very early in the wet testing process when water on the SD card caused it to fail to write in a way that caused the SD driver to never return. This caused a number of issues to become apparent simultaneously:

This occurred during a blocking call.
The constraint on SD card timeouts was excessively long.
The watchdog timer was not active, due to having been disabled for debugging efforts earlier in development.

The last of these issues is the most egregious, not having a watchdog timer means that the last line of defense for firmware issues (a hard reset if a task does not complete in a timely manner) was not present. This was due simply to it being turned off to aid in a debugging effort, and not being turned back on before the firmware was deployed into the field. SD card timeouts were adjusted and the way it handles hardware failure was tweaked, but the lack of a watchdog while diving is too risky to ever have repeat, as it left me with no PPO2 indication (though my backup instruments were working) and a solenoid that was stuck open, injecting excess oxygen. As this was early in the testing process, this was at a shallow depth where it did not present an imminently hazardous condition, but was a fail-dangerous state.

To address this, some more information was researched on the IWDG (Independent Watchdog) peripheral, and in doing so there were two key features that could be used to prevent this from being an issue. The first is to set the watchdog to pause when a debugger pauses execution, this prevents the spurious resetting that resulted in the watchdog being disabled in the first place. The second feature used was to set the Option Bytes (sometimes referred to as fuses by other microcontroller vendors) to run the IWDG from startup, meaning it is not possible to simply disable it in firmware. The current firmware also asserts these option bytes are present and sets them if they are not. This ensures that the last line of defense against a software defect remains present.

Since then years of development has taken place, and the board has proven itself to be robust and reliable in a number of units and facing a number of real-world challenges such as power failures, cell failures, far larger solenoids than originally anticipated. In all cases it has performed as expected.

Future Development

Development continues on several fronts:

UDS Diagnostic Layer: SW Handsets have a poorly documented bridge for communicating with the DiveCAN bus using ISO-TP and UDS. Implementing this interface allows for much deeper configuration and monitoring.

Web configuration/monitoring: Using UDS brdige and the bluetooth APIs present in Chrome and Chrome-derived browsers it is possible to access the boards UDS interface from the browers. By building out a web tool it is possible for much deeper diagnostics to be performed in-situ.

Additional hardware interfaces: There is demand for both CO2 monitoring, temperature sticks, and high pressure gas monitoring to be done via the handset, and the board has the physical connections required to accommodate these additional peripherals. The software and testing effort just needs to be undertaken to realize these functions.