nheko/PLAN-calls.md
2026-03-11 00:44:53 +02:00

5.7 KiB

Stabilize Existing VoIP Call Flow and Add Non-Fatal Diagnostics

Summary

Tighten the existing Matrix call flow in the current VoIP stack, focusing on real correctness issues already present in src/voip/CallManager.cpp and src/voip/WebRTCSession.cpp, and add structured logging around non-fatal failures so future call bugs are diagnosable without crashing or silently retrying forever.

This plan targets:

  • Core invite/answer/candidate/select-answer stability
  • Inbound renegotiation support for m.call.negotiate
  • Logging of non-critical exceptions and ignored/invalid call events
  • No user-visible feature expansion beyond making current calls more reliable

Implementation Changes

  • Fix the CallSelectAnswer rejection lookup bug in CallManager::handleEvent(const RoomEvent&) by changing the std::find(begin, begin, ...) calls to search through end(). This is a concrete logic bug that currently prevents expected branch behavior.
  • Add strict call-id and state guards across inbound handlers in CallManager and log every early-return branch that currently drops events silently. Include at least:
    • mismatched call_id
    • wrong local webrtc::State
    • duplicate answers/select-answer/reject
    • self-echo cases
    • candidates received before a call is initialized
  • Implement WebRTCSession::acceptNegotiation(const std::string &) instead of the current unconditional false. Use the same parsing/validation style as acceptOffer:
    • require an active session, not DISCONNECTED
    • parse SDP as a remote offer
    • validate required audio media and optional VP8 video media
    • update remote video direction flags
    • call set-remote-description
    • create and send an answer through the existing answerCreated path
    • keep wire format unchanged; do not add local renegotiation initiation in this pass
  • Extend CallManager::handleEvent(const RoomEvent&) so renegotiation only applies to the active call, logs dropped mismatches, and drains queued ICE candidates only after renegotiation is accepted.
  • Harden acceptICECandidates handling:
    • log when candidates are ignored because the session is not ready
    • keep buffering only in manager-side inbound invite/renegotiation waiting states
    • avoid silent no-ops when candidates arrive for the wrong call or after teardown
  • Improve failure handling around setup paths:
    • log TURN retrieval failures in retrieveTurnServer() with the request error and retry interval
    • log why createOffer, acceptOffer, acceptAnswer, and renegotiation fail, including current state and call id where available
    • log pipeline creation failures with the missing device/element context instead of only generic “Problem setting up call”
  • Add non-critical exception logging around call orchestration paths that touch cache/state but should not kill the process:
    • syncEvent dispatch
    • inbound call event handlers that query room info/members
    • outbound invite setup that reads room/member data
    • TURN response parsing/conversion Catch std::exception and log with nhlog at warn or error depending on whether the call can continue; rethrow nothing for these paths. Add a final catch (...) only where needed to prevent silent termination, with a clear “unknown exception” log line.
  • Keep user notifications mostly unchanged. Prefer logs for diagnostics and only surface UI notifications for existing fatal outcomes such as setup failure, ICE failure, reject, timeout, or answered elsewhere.

Public Interfaces / Behavior

  • No QML property or signal changes.
  • No Matrix event schema changes.
  • Existing CallAnswer, CallCandidates, CallNegotiate, and CallSelectAnswer messages remain the transport surface.
  • Behavioral change: inbound m.call.negotiate will now be handled instead of always failing.
  • Behavioral change: previously silent dropped call events and recoverable exceptions will now emit logs with call id, party id, and local state.

Test Plan

  • Add focused unit-style coverage where practical for pure call-manager logic, or otherwise add regression tests around isolated helper logic if the repo lacks VoIP integration tests.
  • Verify manually or with targeted harness coverage:
    • outbound call reaches OFFERSENT, receives answer, transitions to CONNECTED
    • inbound call invite buffers ICE before accept, then drains after accept
    • duplicate CallAnswer and CallSelectAnswer are ignored with logs, not state corruption
    • CallSelectAnswer rejection-party matching now works after the std::find fix
    • incoming CallNegotiate on the active call produces a local answer instead of immediate failure
    • CallNegotiate for a different call_id is ignored with a diagnostic log
    • TURN retrieval failure logs retry details without crashing
    • exceptions thrown from room/member lookup in call paths are logged and do not abort the process
  • Run a build check after the changes. If the environment still lacks cmake, document that verification gap and at minimum run the strongest available static inspection.

Assumptions

  • Use existing nhlog logging categories rather than raw printf; “prints” here means persistent diagnostic logging.
  • Keep the existing user-facing notification policy unless a failure is already fatal to the active call.
  • Renegotiation scope is inbound-only for now because the current codebase already receives m.call.negotiate but does not expose a local renegotiation initiator.
  • No attempt will be made in this pass to redesign the entire call state machine or add automatic media-session recovery after teardown.