Decoding the Complexities of Janus WebRTC Server Recordings

In today’s digital landscape, the demand for robust communication solutions is paramount. At our company, we turned to the Janus WebRTC…

Decoding the Complexities of Janus WebRTC Server Recordings

In today’s digital landscape, the demand for robust communication solutions is paramount. At our company, we turned to the Janus WebRTC Server to bridge the gap between WebRTC and telephony (PBX) connections. Our goal was to facilitate agents in calling leads to enhance sales efforts. While various WebRTC solutions exist, none match the capabilities of Janus in terms of open-source offerings. This article explores the architecture of our Janus setup and some challenges we faced, particularly concerning the recording of WebRTC calls.

Architecture 1

We implemented a fully functional WebRTC-to-telephony system using Janus and its SIP plugin. This setup allowed our agents to make calls to leads on their traditional phones. At the heart of our architecture lies the unique recording format used by Janus, known as MJR.

MJR Mayhem

Janus adopts a simple approach to recording media. The server stores raw RTP packets in a specific format with headers called MJR. Initially, we leveraged the native recording support of the SIP plugin, which creates separate recordings for each party in a communication. This meant that for every SIP call, we received two MJR files:

  • somefilename-user-audio.mjr
  • somefilename-peer-audio.mjr

These files could be named dynamically based on the call using the SIP plugin API. Since our focus was solely on audio, we had a straightforward two-file system for each SIP call.

However, the MJR format isn’t compatible with most media players or browsers, necessitating conversion to a more widely accepted format, such as WAV. To achieve this, Janus provides a command-line tool called janus post-processing (janus-pp-rec), which can convert MJR files into WAV format:

janus-pp-rec somefilename-user-audio.mjr somefilename-user-audio.wav 
janus-pp-rec somefilename-peer-audio.mjr somefilename-peer-audio.wav

Final Recording

Once we had the WAV files, we needed to stitch them together. This process, known as downmixing, allows us to merge both audio tracks into a single file, creating a cohesive audio experience for our agents. We utilized ffmpeg to achieve this:

ffmpeg -i somefilename-user-audio.wav -i somefilename-peer-audio.wav \ 
                   -filter_complex "[0:a][1:a]amix=inputs=2:duration=longest:dropout_transition=3" \ 
                   -c:a libopus somefilename.opus

We opted for the Opus format instead of WAV for the final output. Opus is a highly efficient, lossy audio codec that excels in streaming applications. Its compressed nature significantly reduces file size compared to WAV, which is uncompressed. This choice became particularly crucial when we noticed a staggering bill of $7,000 from Google Storage due to large file sizes.

The Downsides

Despite the initial success of our implementation, we encountered significant downsides. Some recordings ballooned to 2 GB in size, with durations inaccurately reflecting up to 72 hours of audio, even when actual call lengths were much shorter. This issue arose during the conversion process with janus-pp-rec, which lacked certain parameters necessary for edge cases.

The Fix

Fortunately, the solution was straightforward. By including two optional arguments — ignore-first and audioskew—when running the conversion tool, we could rectify the recording sizes and durations:

janus-pp-rec -i 20 -S 5000 somefile-peer-audio.mjr somefile-peer-audio.wav
# according to janus-pp-rec these options signify 
-S, --audioskew=milliseconds  Time threshold to trigger an audio skew 
                                  compensation, disabled if 0 (default=0) 
-i, --ignore-first=count      Number of first packets to ignore when 
                                  processing, e.g., in case they're cause of 
                                  issues (default=0)

This fix allowed us to streamline our audio processing pipeline, ensuring that recordings maintained their intended size and duration.

Architecture 2

After implementing our initial recording system using the Janus WebRTC Server and its SIP plugin, we discovered another Janus plugin called audiobridge. This plugin supports WAV-based recording out of the box, which prompted us to reassess our approach to audio recording.

Integrating Audiobridge

Given that we were already utilizing the audiobridge room plugin for whisper and spy functionalities, integrating SIP audio tracks into an audiobridge room became the logical next step. The audiobridge plugin automatically generates a final.wav file for the room, providing a streamlined solution for recording our audio calls.

Optimization with Opus

To further optimize our audio files, we converted the final.wav file produced by the audiobridge into the Opus format. This decision not only reduced the file size but also maintained high audio quality—benefits that are especially important for storage and performance in streaming scenarios.

Conclusion

Switching to the audiobridge plugin simplified our recording process significantly. The automatic generation of WAV files meant less manual intervention, while the seamless integration of our SIP audio tracks enhanced functionality. By adopting Opus for our final audio files, we achieved an efficient solution that meets our operational needs without compromising on quality. This architecture shift not only improved our system’s efficiency but also set the foundation for scalable future developments in our WebRTC implementation with Janus.