|
|
This project is being developed under the SDR Makerspace activity *DVB-S2 LDPC SIMD* by the **ReDS Institute** of Yverdon-les-Bains in Switzerland.
|
|
|
|
|
|
# Introduction
|
|
|
|
|
|
LDPC decoding is the most computational demanding task in a DVB-S2 receiver chain. This work is intended to provide an optimized decoder of DVB-S2 LDPC codewords. The decoder must have good error correction performances, close to what the state-of-the-art is already capable of. It also must have a good throughput and a good latency.
|
|
|
|
|
|
This repository hosts two software components :
|
|
|
- A testing environment to validate developed decoders
|
|
|
- A library of optimized LDPC decoders
|
|
|
|
|
|
The [testing environment](https://github.com/blegal/Fast_LDPC_decoder_for_x86) has been developed by *Bertrand Legal et al.* to validate their own optimized LDPC decoders. It is being reused for this activity.
|
|
|
This environment has multiple purposes, mainly: benchmark the performance in terms of throughput and test the error-correction performance of LDPC decoders.
|
|
|
|
|
|
The library is a collection of LDPC decoders optimized with vector operations for x86 processors (SSE 1-4). The library is able to decode DVB-S2 codes with code-rates *1/2*, *8/9* and *9/10* on long codewords only (64800 bit codewords). It also proposes two decoding strategies **flooded** and **layered** where both were investigated for their performances.
|
|
|
|
|
|
# How to use
|
|
|
## Library
|
|
|
In the root directory of the project, do :
|
|
|
|
|
|
```
|
|
|
mkdir -p build
|
|
|
cd build
|
|
|
cmake ..
|
|
|
make -j4
|
|
|
sudo make install
|
|
|
```
|
|
|
|
|
|
The header files and the library itseld should be found installed in `/usr/local/include/` and `/usr/local/lib/` respectively.
|
|
|
|
|
|
To develop with the library, include the following header :
|
|
|
|
|
|
```c
|
|
|
#include <reds-dvb-decoder.h>
|
|
|
```
|
|
|
|
|
|
First declare and initialize a decoder :
|
|
|
|
|
|
```c
|
|
|
REDS_DVB_DECODER_Decoder_t *decoder;
|
|
|
REDS_DVB_DECODER_Scheduler_t *scheduler = REDS_DVB_DECODER_SCHEDULER_FLOODED;
|
|
|
REDS_DVB_DECODER_Code_t *code = REDS_DVB_DECODER_CODE_DVB_S2_64800_32400;
|
|
|
REDS_DVB_DECODER_Error_t error;
|
|
|
|
|
|
error = REDS_DVB_DECODER_Init(decoder, scheduler, code);
|
|
|
```
|
|
|
|
|
|
There are multiple available schedulers and code you can specify to the decoder initializer. The available schedulers are :
|
|
|
- REDS_DVB_DECODER_SCHEDULER_FLOODED
|
|
|
- REDS_DVB_DECODER_SCHEDULER_LAYERED
|
|
|
|
|
|
The available codes are :
|
|
|
- REDS_DVB_DECODER_CODE_DVB_S2_64800_32400
|
|
|
- REDS_DVB_DECODER_CODE_DVB_S2_64800_7200
|
|
|
- REDS_DVB_DECODER_CODE_DVB_S2_64800_6480
|
|
|
|
|
|
Once the decoder is initialized you can pass it to the decode function like this :
|
|
|
|
|
|
```c
|
|
|
#define CODEWORD_LEN 64800
|
|
|
|
|
|
REDS_DVB_DECODER_Decoder *decoder;
|
|
|
char soft_bits[CODEWORD_LEN];
|
|
|
char hard_bits[CODEWORD_LEN];
|
|
|
int nb_iter = 10;
|
|
|
|
|
|
/*
|
|
|
* Fetch a codeword of soft bits encoded in signed 8 bit values
|
|
|
*/
|
|
|
|
|
|
REDS_DVB_DECODER_Decode(decoder, soft_bits, hard_bits, nb_iter);
|
|
|
|
|
|
/*
|
|
|
* Hard bits encoded either to 0 or 1 are returned in each byte of hard_bits
|
|
|
*/
|
|
|
```
|
|
|
|
|
|
Note that the size of the **soft_bits and hard_bits arrays must have the correct length**, depending on the code used. For example, if you use a 64800x32400 code, the arrays must have a length of 64800 * sizeof(char) bytes. Failing to do so will lead to undefined behavior.
|
|
|
|
|
|
Once you are done with the decoder, free it with :
|
|
|
|
|
|
```c
|
|
|
REDS_DVB_DECODER_Terminate(decoder);
|
|
|
```
|
|
|
|
|
|
## Testing environment
|
|
|
|
|
|
**The library is integrated in the testing environment**. To make use of the testing environment in order to see the library's decoders performance, do the following to compile it :
|
|
|
|
|
|
```
|
|
|
mkdir -p build
|
|
|
cd build
|
|
|
cmake ..
|
|
|
make -j4
|
|
|
```
|
|
|
|
|
|
Change directory to `Fast_LDPC_decoder_for_x86` :
|
|
|
|
|
|
```
|
|
|
cd Fast_LDPC_decoder_fox_x86
|
|
|
```
|
|
|
|
|
|
After compiling the testing environment, run the executable *main.icc* like so :
|
|
|
|
|
|
`./main.icc -fixed -<implementation> -MS -iter <num iter> -thread <num threads> -encoder -min <min SNR> -max <max_SNR>`
|
|
|
|
|
|
There are multiple implementations to choose from :
|
|
|
- layered (baseline)
|
|
|
- layered-sse (optimized)
|
|
|
- flooded (baseline)
|
|
|
- flooded-sse (optimized)
|
|
|
|
|
|
For example, If you wish to run the *layered-sse* implementation, on 4 threads, 10 iterations and a SNR range of 1 to 5 dB, execute the command :
|
|
|
|
|
|
`./main.icc -fixed -layered-sse -MS -iter 10 -thread 4 -encoder -min 1.0 -max 5.0`
|
|
|
|
|
|
Optionally, you can specify a time constraint so that the environment runs for the specified amount of time. It also activates the throughput measurements. To do so, append the argument `-timer <time in seconds>`.
|
|
|
|
|
|
# Performance
|
|
|
## Error correction
|
|
|
|
|
|
As seen below, the error-correction performance is highly dependent both on :
|
|
|
- code-rate used
|
|
|
- number of decoding iterations
|
|
|
|
|
|
The data used for the plots has been extracted from the testing environment, by varying the number of iterations, coderate and decoding strategy. Note that **OMS 0** is a reference decoder already integrated in the testing environment. This decoder does layered scheduling, with Offset-Min-Sum belief-propagation with offset 0.
|
|
|
|
|
|
### Coderate 1/2
|
|
|
|
|
|
Note how, at the same number of iterations, the layered and flooded decoders have different error-correction performance. When benchmarking their throughput, it is important to adjust their iteration count so that we compare throughput with the exact same error-correction beheavior.
|
|
|
|
|
|
![BER_vs_SNR_coderate_1_2_5_iterations.svg](uploads/3fb708586789d97424e2b7b85f43a8bf/BER_vs_SNR_coderate_1_2_5_iterations.svg)
|
|
|
|
|
|
![BER_vs_SNR_coderate_1_2_10_iterations.svg](uploads/8023982487eb7ea8e4bdc64f8c0614b8/BER_vs_SNR_coderate_1_2_10_iterations.svg)
|
|
|
|
|
|
![BER_vs_SNR_coderate_1_2_20_iterations.svg](uploads/555c5f0d523e52f6aa64186f86cdea96/BER_vs_SNR_coderate_1_2_20_iterations.svg)
|
|
|
|
|
|
### Coderate 8/9
|
|
|
|
|
|
![BER_vs_SNR_coderate_8_9_5_iterations.svg](uploads/ff57d21b7c41d7fb73967a2b6155556f/BER_vs_SNR_coderate_8_9_5_iterations.svg)
|
|
|
|
|
|
![BER_vs_SNR_coderate_8_9_10_iterations.svg](uploads/44b253aa7dbc417388e3c4f1cdbeaba5/BER_vs_SNR_coderate_8_9_10_iterations.svg)
|
|
|
|
|
|
![BER_vs_SNR_coderate_8_9_20_iterations.svg](uploads/7d686b46e813fe9c58a547050cabe221/BER_vs_SNR_coderate_8_9_20_iterations.svg)
|
|
|
|
|
|
### Coderate 9/10
|
|
|
|
|
|
![BER_vs_SNR_coderate_9_10_5_iterations.svg](uploads/13124b6f5274736ed812199e70f58617/BER_vs_SNR_coderate_9_10_5_iterations.svg)
|
|
|
|
|
|
![BER_vs_SNR_coderate_9_10_10_iterations.svg](uploads/78b49e075f6bf7886a98694de49b3ea1/BER_vs_SNR_coderate_9_10_10_iterations.svg)
|
|
|
|
|
|
![BER_vs_SNR_coderate_9_10_20_iterations.svg](uploads/bb0b9fd095360b486b9f42180a600168/BER_vs_SNR_coderate_9_10_20_iterations.svg)
|
|
|
|
|
|
## Throughput and latency
|
|
|
TODO: comparison between optimized and baseline code
|
|
|
|
|
|
The reference machine has a quad-core i7-3770 Intel CPU. Its nominal frequency is 3.40 GHz and its maximum frequency is 3.9 GHz.
|
|
|
|
|
|
### Baseline performance (4 threads enabled) :
|
|
|
|
|
|
| Algorithm | Latency per codeword (ms) | Throughput (Mb/s) |
|
|
|
| :---- | ------ | --- |
|
|
|
| Flooded 20 iterations coderate 1/2 | 45.92 | 5.10 |
|
|
|
| Layered 10 iterations coderate 1/2 | 22.33 | 11.06 |
|
|
|
| Flooded 20 iterations coderate 8/9 | 42.71 | 6.03 |
|
|
|
| Layered 10 iterations coderate 8/9 | 72.19 | 3.69 |
|
|
|
| Flooded 20 iterations coderate 9/10 | 34.03 | 4.74 |
|
|
|
| Layered 10 iterations coderate 9/10 | 79.28 | 3.32 |
|
|
|
|
|
|
### Optimized performance (4 threads enabled) :
|
|
|
|
|
|
| Algorithm | Latency per codeword (ms) | Throughput (Mb/s) |
|
|
|
| :---- | ------ | --- |
|
|
|
| Flooded 20 iterations coderate 1/2 | 12.84 | 8.29 |
|
|
|
| Layered 10 iterations coderate 1/2 | 8.28 | 30.16 |
|
|
|
| Flooded 20 iterations coderate 8/9 | 8.59 | 13.27 |
|
|
|
| Layered 10 iterations coderate 8/9 | 5.13 | 53.08 |
|
|
|
| Flooded 20 iterations coderate 9/10 | 9.51 | 11.61 |
|
|
|
| Layered 10 iterations coderate 9/10 | 5.13 | 53.08 |
|
|
|
|