More Noise Resilence

With the automatic tuning feature the local clock becomes significantly more accurate. Since I also added a persistence for the correction factors it is natural to assume that the clock will be more accurate right from the start. After experimenting and testing for more than two months I now know that the temperature influences at room temperature are low enough to achieve a precision of better than 3 ppm at startup. I even tried this under the roof where temperature swings are larger than in the rest of the house.

With other words: the assumption that the clock is accurate to about 100 ppm at startup can be replaced with the assumption that the clock is accurate to 5 ppm or better. Since the timeconstants for the 100 ppm clock precision already containw some safety margin this implies that I can increase the time constants for a tuned clock by a factor of 10.

The impact is that the initial decoder stage may collect 10 times the data and thus improve signal noise ratio accordingly. It follows that the clock will stay locked under even more adverse conditions. Also the threshhold time for the unlocked to free transition can be increased 10 times. Thus recovery from signal fading will be much better than before.

So how is this done in practice? The crucial pieces of code are as follows.

namespace DCF77_Frequency_Control {
   void auto_persist() {
        // ensure that reading of data can not be interrupted!!
        // do not write EEPROM while interrupts are blocked
        int16_t adjust;
        int8_t  precision;
        const uint8_t prev_SREG = SREG;
        cli();
        if (data_pending && confirmed_precision > 0) {
            precision = confirmed_precision;
            adjust = DCF77_1_Khz_Generator::read_adjustment();
        } else {
            data_pending = false;
        }
        SREG = prev_SREG;
        if (data_pending) {
            int16_t ee_adjust;
            int8_t  ee_precision;
            read_from_eeprom(ee_precision, ee_adjust);

            if (confirmed_precision < abs(ee_precision) ||        // - precision better than it used to be
                ( abs(ee_precision) < 8 &&                        // - precision better than 8 Hz or 0.5 ppm @ 16 MHz
                  abs(ee_adjust-adjust) > 8 )           ||        //   deviation worse than 8 Hz (thus 16 Hz or 1 ppm)
                ( confirmed_precision == 1 &&                     // - It takes more than 1 day to arrive at 1 Hz precision
                  abs(ee_adjust-adjust) > 0 ) )                   //   thus it acceptable to always write
            {
                cli();
                const int16_t new_ee_adjust = adjust;
                const int8_t  new_ee_precision = precision;
                SREG = prev_SREG;
                persist_to_eeprom(new_ee_precision, new_ee_adjust);
                DCF77_Clock_Controller::on_tuned_clock();
            }
            data_pending = false;
        }
    }

    void setup() {
        int16_t adjust;
        int8_t ee_precision;

        read_from_eeprom(ee_precision, adjust);
        if (ee_precision) {
            DCF77_Clock_Controller::on_tuned_clock();
        }

        const uint8_t prev_SREG = SREG;
        cli();

        SREG = prev_SREG;
        DCF77_1_Khz_Generator::adjust(adjust);
    }
}

As you can see, as soon as the precision is 8 Hz (or 0.5 ppm) or if it gets a frequency adjustment during setup it will call DCF77_Clock_Controller::on_tuned_clock(). As the name indicates this should be considered something like a “event handler”. The implementation dispatches the call as follows.

    void on_tuned_clock() {
        DCF77_Demodulator::set_has_tuned_clock();
        DCF77_Local_Clock::set_has_tuned_clock();
    };

The two setters just increase the time constants for the DCF77_Demodulator and the DCF77_Local_Clock Module.

namespace DCF77_Demodulator {
    ...
    // how many seconds may be cummulated
    // this controls how slow the filter may be to follow a phase drift
    // N times the clock precision shall be smaller 1/100
    // clock 30 ppm => N < 300
    uint16_t N = 300;
    void set_has_tuned_clock() {
        // will be called once crystal is tuned to better than 1 ppm.
        N = 3600;
    }
    ...
}

namespace DCF77_Local_Clock {
    ...
    uint32_t max_unlocked_seconds = 3000;
    void set_has_tuned_clock() {
        max_unlocked_seconds = 30000;
    };
    ...
}

The impact is tremendous. Do you remember the Phase Detection Experiment? N controls the bin size of the decoder for the phase detection. A larger value of N implies that it will tolerate significantly more noise. The noise tolerance is increased to a level where you can disconnect the receiver antenna for more than half an hour without losing the lock.

The increase of the max_unlocked_seconds to 30000 will in addition enable accelerated recapture of the phase lock for the first 30000 seconds (more than 8 hours) after losing a lock.

Although this is very cool there is one issue with this approach that kept bugging me. Some of my readers (most notably from the UK) noticed that they have difficulties with the auto tune. This is because they are not only dealing with notice but with significant fading. Often they will have only some hours of DCF77 reception per day. The noise is easily filtered by my clock but the periods are usually to short to finish the auto tuning. So these guys with the weakest signals and the poorest SNR will not get this improved noise tolerance. Obviously there is a need for automatic tuning that can tolerate signal loss during the tuning process.

My solution is a new and significantly improved auto tuning algorithm. Below is the meat of the new implementation.

namespace DCF77_Frequency_Control {
    volatile int8_t confirmed_precision = 0;
    
    // indicator if data may be persisted to EEPROM
    volatile boolean data_pending = false;
    
    // 2*tau_max = 32 000 000 centisecond ticks = 5333 minutes
    volatile uint16_t elapsed_minutes;
    // 60000 centiseconds = 10 minutes
    // maximum drift in 32 000 000 centiseconds @ 900 ppm would result
    // in a drift of +/- 28800 centiseconds
    // thus it is uniquely measured if we know it mod 60 000
    volatile uint16_t elapsed_centiseconds_mod_60000;
    volatile uint8_t  start_minute_mod_10;

  
    // Seconds 0 and 15 already receive more computation than
    // other seconds thus calibration will run in second 5.
    const int8_t calibration_second = 5;
    
    volatile calibration_state_t calibration_state = {false ,false};
    volatile int16_t deviation;

    // get the adjust step that was used for the last adjustment
    //   if there was no adjustment or if the phase drift was poor it will return 0
    //   if the adjustment was from eeprom it will return the negative value of the
    //   persisted adjust step
    int8_t get_confirmed_precision() {
        return confirmed_precision;
    }
    
    void qualify_calibration() {
        calibration_state.qualified = true;
    };
    
    void unqualify_calibration() {
        calibration_state.qualified = false;
    };

    int16_t compute_phase_deviation(uint8_t current_second, uint8_t current_minute_mod_10) {
        int32_t deviation=
             ((int32_t) elapsed_centiseconds_mod_60000) -
             ((int32_t) current_second        - (int32_t) calibration_second)  * 100 -
             ((int32_t) current_minute_mod_10 - (int32_t) start_minute_mod_10) * 6000;

        // ensure we are between 30000 and -29999
        while (deviation >  30000) { deviation -= 60000; }
        while (deviation <=-30000) { deviation += 60000; }

        return deviation;
    }

    calibration_state_t get_calibration_state() {
        return *(calibration_state_t *)&calibration_state;
    }

    int16_t get_current_deviation() {
        return deviation;
    }

    void adjust() {
        int16_t total_adjust = DCF77_1_Khz_Generator::read_adjustment();

        // The proper formular would be
        //     int32_t adjust == (16000000 / (elapsed_minutes * 6000)) * new_deviation;
        // The total error of the formula below is ~ 1/(3*elapsed_minutes)
        //     which is  ~ 1/1000
        // Also notice that 2667*deviation will not overflow even if the
        // local clock would deviate by more than 400 ppm or 6 kHz
        // from its nominal frequency.
        // Finally notice that the frequency_offset will always be rounded towards zero_provider
        // while the confirmed_precision is rounded away from zereo. The first should
        // be considered a kind of relaxation while the second should be considered
        // a defensive computation.
        const int16_t frequency_offset = ((2667 * (int32_t)deviation) / elapsed_minutes);
        // In doubt confirmed precision will be slightly larger than the true value
        confirmed_precision = (((2667 - 1) * 1) + elapsed_minutes) / elapsed_minutes;
        if (confirmed_precision == 0) { confirmed_precision = 1; }

        total_adjust -= frequency_offset;
        
        if (total_adjust >  max_total_adjust) { total_adjust =  max_total_adjust; }
        if (total_adjust < -max_total_adjust) { total_adjust = -max_total_adjust; }
        
        DCF77_1_Khz_Generator::adjust(total_adjust);
    }

    void process_1_Hz_tick(const DCF77::time_data_t &decoded_time) {
        const int16_t deviation_to_trigger_readjust = 5;

        deviation = compute_phase_deviation(decoded_time.second, decoded_time.minute.digit.lo);

        if (decoded_time.second == calibration_second) {
            const bool leap_second_scheduled = decoded_time.leap_second_scheduled;
            // This is dirty: we overwrite a constant. This will only
            // work because we are in an interrupt and will not be interrupted.
            // We restore the constant immediately after the check.
            // Unfortunately this is necessary because we might be
            // in an unqualified state and thus the leap second information may be wrong.
            // However if we fail to detect this calibration will be wrong by
            // 1 second.
            ((DCF77::time_data_t)decoded_time).leap_second_scheduled = true;
            if (DCF77_Encoder::verify_leap_second_scheduled(decoded_time)) {
                // Leap seconds will mess up our frequency computations.
                // Handling them properly would be slightly more complicated.
                // Since leap seconds may only happen every 3 months we just
                // stop calibration for leap seconds and do nothing else.
                calibration_state.running = false;
            }
            ((DCF77::time_data_t)decoded_time).leap_second_scheduled = leap_second_scheduled;

            if (calibration_state.running) {
                if (calibration_state.qualified) {
                    if ((elapsed_minutes >= tau_min_minutes && abs(deviation) >= deviation_to_trigger_readjust) ||
                        elapsed_minutes >= tau_max_minutes) {
                        adjust();

                        // enqueue write to eeprom
                        data_pending = true;
                        // restart calibration next second
                        calibration_state.running = false;
                    }
                } else {
                    // unqualified
                    if (elapsed_minutes >= tau_max_minutes) {
                        // running unqualified for more than tau minutes
                        //   --> the current calibration attempt is doomed
                        calibration_state.running = false;
                    }
                    // else running but unqualified --> wait for better state
                }
            } else {
                // (calibration_state.running == false) --> waiting
                if (calibration_state.qualified) {
                    elapsed_centiseconds_mod_60000 = 0;
                    elapsed_minutes = 0;
                    start_minute_mod_10 = decoded_time.minute.digit.lo;
                    calibration_state.running = true;
                }
                // else waiting but unqualified --> nothing to do
            }
        }
    }

    void process_1_kHz_tick() {
        static uint8_t divider = 0;
        if (divider < 9) {
            ++divider;
        }  else {
            divider = 0;

            if (elapsed_centiseconds_mod_60000 < 59999) {
                ++elapsed_centiseconds_mod_60000;
            } else {
                elapsed_centiseconds_mod_60000 = 0;
            }
            if (elapsed_centiseconds_mod_60000 % 6000 == 0) {
                ++elapsed_minutes;
            }
        }
    }
    ...
}

This code basically operates in 2*2 or 4 different states. It may be either in state “qualified” or “unqualified”. This refers to the sync state of the clock. If the clock syncs the state will shift to qualified (lower part of the diagram below) and if sync is lost it will shift to unqualified (upper part of the diagram).

The other two states are “running” and “waiting”. The transistions are connected as follows.

The important part are of course the transitions between running and waiting. These transitions are computed once per minute at the dedicated “calibration second tick”. There is nothing special about this tick, however due to the large time constants it suffices to compute this once per minute. It also implies that it becomes unnecessary to compute the seconds offsets because they are always 0 centiseconds (mod 6000 and thus also mod 60000).

While running it will count centiseconds (derived from the 1 kHz ticks) which are derived from the crystal clock. It will also count minutes. The minute count is tricky. While the clock is in sync the minutes are derived from the DCF77 signal. In addition the 1 Hz ticks are derived from the DCF77 phase lock. Thus if the clock is “qualified running” the minutes mod 10 plus the seconds and the centiseconds mod 60000 must match. Notice that this is not the “elapsed_minutes” which are also derived from the 1 kHz ticks. I am talking about the DCF77 minutes and seconds here.

You may also wonder why I work with centiseconds mod 60000 and minutes. The point is that I want to keep everything in 16 bit integers. This does not only conserve memory it also avoids costly 32 bit integer division operations. The downside is that there must be no “unqualified running” period with more than 30000 centiseconds (or 300 seconds) deviation. A quick computation in the head indicated that even at 1000 ppm frequency error this will happen only after 3 000 000 centiseconds hence the ~5000 minutes maximum calibration time will not make this overflow.

Very picky readers will also notice that my integer computations are not 100% accurate. Well, this is integer math but it does not really matter anyway. The point is that clock errors are not equally spaced anyway. So there is always an uncertainity in the measurements anyway. As it turns out the uncertainity in the measurements is significantly larger than the rounding errors of the integer math.

So how good does it perform now? Well, below there are some log files which I got from Ian Castleton. (WordPress does not allow me to upload .txt.gz files. Thus I renamed the suffix to “key”. Please save the .key files as .txt.gz files and unpack them.) For the interpretation of the log files see the Swiss Army Debug Helper.

Ian lives in London. As you can see from the logs the signal quality is crappy. There is a *lot* of fading. And when there is no fading there is still considerable background noise. As you can see from the 2014 10 09 log it took almost two hours before the clock started to tune. Especially in the 2014 10 11 log you can see how poor the signal really is. Anyway the clock locked to it and tuned successfully to better than 0.5 ppm. Some long term tests are still runnning. But so far it looks as it works perfectly well.

dcf77_2014_10_07.txt.gz
dcf77_2014_10_09.txt.gz
dcf77_2014_10_11.txt.gz

3 Responses to More Noise Resilence

marosy says:

August 21, 2018 at 22:45

Hi Udo,
Any hint on why the persistent tuning has been removed (github)?

- blinkenlightblog says:
  
  August 23, 2018 at 21:00
  
  It was not worth the effort. I caused more issues than it solved. The point is that if the clock is running continously it does not help. If the clock is NOT running continously there is no guarantee that the operational conditions at the next startup are anywhere near what was persisted. In particular it could happen for some very poor crystals, that the feature made it impossible to acquire any lock at all. So the persistence introduce complexity and more code to the library but did not benefit everyone. It also implied that it might interfere with other uses of the EEPROM. The theoretical gain in noise resilience during startup is marginal in practice. Thus I decided to remove this feature.
  
marosy says:

August 24, 2018 at 16:07

Hi Udo,
Thank you for the explanation.