OCS watchdog


koal01
 

Hi Howard,
I don't know about OCS experience with other members but as far as i'm concerned the longest period without a webserver crash is 6 or 7 days.
When the webserver is down i have to reset the ethernet shield and OCS webserver comes again.
I tried the watchdog feature with different pins on the mega but it does not work in my setup.
You talked a time ago about improving that but maybe before that i thought you would give some advices to check.
I'm avalaible for any test if you want.
Thanks for all
Koal01


Howard Dutton
 

I'll take another look at that when time allows.

For what it's worth my OCS is very stable, months of up time, as power outages now and then allow.


Fernando Nino Sr
 

Ditto here, have had it running out in observatory since feb 2020  with no issues. I have a power adapter providing power to the mega and shield.  When the unit was connected to my desktop while I was trying to figure out the system, I had some issues.  But now with power adapter runs stable.
Nino 


koal01
 

After checking inside the box i found an i2c plug not stable. I launched a permanent ping on my laptop and while i was touching this i2c plug with my finger i could see no answers on the ping, i touched it again and the ping came again. I resoldered this part carefully and things seem going much better. I cross fingers.
Howard i'm very pleased to see all OCS working nicely,  i installed a brand new rain sensor this afternoon. I tested it dropping watter on it, the status moved to "rain", OK !  I removed all the water carefully until it comes dry but it took one hour before the status moved back from warn to dry. on the web page When i launch the example sketch, the  changing status are taken into account in a few seconds.
Maybe In OCS  the code keeps a miminum time before showing  a new status ?

Thank you and nice skies to all !


koal01
 
Edited

The web server crashed after 2 days up alas... still i can ping the ethernet shield which means OCS is working even if it does not communicate any more.
It is at this point i imagine watchdog could do something, to solve the problem i have to go physically to my shelter and hit the reset button, then the website works again immediately.
I can check that OCS did not stop working while the website was off since the 24 and 48 hours readings are avalaible 
I'll check other parts...


Howard Dutton
 
Edited

I have not forgotten about this and will take a serious look at it.

I think a feature where the OCS pings an external server every x hours and if it fails to get a response, it resets itself.  When I say resets it needs to be a real reset of the Ethernet adapter where we use a GPIO attached to its reset pin.  So we let the watchdog expire (ping failed), the Mega2560 restarts then (hard) resets the W5100 when it comes up.  That should allow recovery from just about anything I think.


koal01
 

Waouhh Howard if you do that it would be very very nice.
The address to ping could be the gateway !
I'm avalaible for all the test you need

Thank you
Koal01


koal01
 

A very secondary question but does anyone know why the MLX90614 that used to cost a couple or three dollars/euros are not accessible less than 14 euros/dollars ?
Sometimes this component rises to more than 20 dollars ? 
Any relation with the covid and new applications with this component ?
I wanted to change it but seeing the prices i thing 'im going to wait.
Koal01


Fernando Nino Sr
 

lot of people are using it to build temperature sensing devices for  covid.  price went sky high.


Howard Dutton
 
Edited

On Fri, Jun 19, 2020 at 05:21 AM, Howard Dutton wrote:
I have not forgotten about this and will take a serious look at it.

I think a feature where the OCS pings an external server every x hours and if it fails to get a response, it resets itself.  When I say resets it needs to be a real reset of the Ethernet adapter where we use a GPIO attached to its reset pin.  So we let the watchdog expire (ping failed), the Mega2560 restarts then (hard) resets the W5100 when it comes up.  That should allow recovery from just about anything I think.
The OCS (master branch) has been updated with this feature:
  • It's just a Config.h option where you specify an IP address that the OCS connects to periodically (on port 80) at the specified # of hours.
    • The default address is the IP of arduino.cc but you could also use your home network's router web page etc (192.168.1.1 for example) as a target if going out on the 'net isn't optimal.
  • If the connection attempt fails the OCS is reset via letting the watchdog timer expire.  The OCS will then reboot and run for that specified # of hours before checking again (so it won't just loop resetting rapidly.)
  • I believe the Ethernet adapter needs its reset pin controlled from an Mega2560 GPIO to be properly reset (by the watchdog method.)  Note there are TWO reset pin connections on the W5100 shield.  One on the side header near the power connections and one on the ICSP header.  These need to be electrically disconnected from the Mega2560 reset and instead the W5100's reset signal needs to be connected to a Mega2560 GPIO pin that the OCS takes control of to reset the shield as required.
  • In my experiments with the watchdog timer I noticed that one of my three Mega2560 would not work properly with the feature.  The comms lights would flash like crazy and the Mega2560 would never return from a watchdog reboot.   The other two worked perfectly, I expect this is due to the Arduino bootloader... so test to be sure this works with your h/w!
  • There is a debug mode associated with this feature (near the bottom of Config.h) where it'll let you know (via Serial) when the board is rebooted, what the connection status was (success/failure,) and also changes the time between connection checks (specified in Config.h) from hours to seconds so you can effectively test without waiting forever!


koal01
 

Hi Howard,
Fantastic and thank you for this very helpful feature.
I'm going to install the version right now and let you know.
My best uptime is 12000 minutes hope this option will make it stable longer !
Nice skies !
Koal01


koal01
 

Hi Howard,
My first tests with this new watchdog are not a success
I tested different pins 8,11,53 and each time when one of these pins is connected physically to the reset button, the one which is near the power connections, the network does not start.
I'm lauching a persistent ping to check when the ethernet shield starts up the network, i see the shield trying to reboot whith yellow leds and the green one on the right but without success since i can't see any response in the ping.
When i disconnect the jumper (pins 8,11 or 53) connected to the reset pin  then the network starts and the ping shows response.
But it shouldn't work like this, isn't it ? since i understood that jumper has to be wired between the reset pin and the pin defined 
Here are the parameters i set in config.h :
#define ETHERNET_RESET_PIN            53 //    OFF, n. Where n=unused Mega2560 pin #, activates feature and allows the OCS   Option
                                          //         to force a reset of the W5100 Ethernet Shield at startup using this pin.
                                          //         W5100 reset pin must be connected to the pin# specified here and no other.
 
// WATCHDOG SETTINGS ---------------------------------------------------------------------------------------------------------------
#define WATCHDOG                      ON //    OFF, ON to reset the Mega2560 after 8 seconds if it hangs for any reason.     Option
#define WATCHDOG_CHECK_HOURS          1 //    OFF, n. Where n=1 to 48 (hours.)  Watchdog address connection check.          Option
IPAddress watchdog(192,  168, 4, 254);   //  ..113, Connections to this address (port 80) monitors Ethernet adapter health.  Option
                                          //         default is arduino.cc.
I have to mention that there are 2  mentions of 
#define WATCHDOG  
The first one at the top of the config.h and the second one at the bottom of the same file, i commented the second one.

Thank you
Koal01


Howard Dutton
 

On Thu, Jul 2, 2020 at 01:01 PM, koal01 wrote:
I tested different pins 8,11,53 and each time when one of these pins is connected physically to the reset button, the one which is near the power connections, the network does not start.
I'm lauching a persistent ping to check when the ethernet shield starts up the network, i see the shield trying to reboot whith yellow leds and the green one on the right but without success since i can't see any response in the ping.
When i disconnect the jumper (pins 8,11 or 53) connected to the reset pin  then the network starts and the ping shows response.
But it shouldn't work like this, isn't it ? since i understood that jumper has to be wired between the reset pin and the pin defined 
What about the W5100 reset signal on the ICSP header is that disconnected from the Mega2560 reset signal too?


koal01
 

Concerning ICSP, the Mega 2560 has 6 headers long male pins plugged in the female headers of the w5100 ethernet shield.
Is this wiring wrong or disturb the watchdog ?


Howard Dutton
 

The Mega2560 reset signal is sent to the Ethernet shield at TWO places.  I physically disconnect both of these.



I bent the pin on the side header but then cut (!) the ICSP reset pin off.

How you go about testing this before making permanent h/w modifications is up to you!


koal01
 

Ok i see ! Im' going to do the same, so i neutralize both reset pins on the mega and wire the reset button of the ethernet shield on a designed pin (8 or 53 for exmple).
Is it right ?
 


Howard Dutton
 

On Fri, Jul 3, 2020 at 06:05 AM, koal01 wrote:
Ok i see ! Im' going to do the same, so i neutralize both reset pins on the mega and wire the reset button of the ethernet shield on a designed pin (8 or 53 for exmple).
Is it right ?
Yes.
The yellow highlighted pins must NOT arrive at the Ethernet shield.


koal01
 

I did what you said and the watchdog works instantly, so nice Howard thank you !!!
I'll see in the days, weeks coming how it is going to work and let you know.
One thing that i really don't understand is when the watchdog is on the charts don't work and this behaviour exists since the begining of OCS...
If i open the log corresponding to the day i can see only one line of data nothging more.
The result is a straight line in all the charts ...
Don't know if it is specific to me or if is a singular bug
Any way the watchdog is a great step and would like to thank you again for this very important feature

Nice skies
Koal01


koal01
 

I did some tests this week-end on the new version 2.2 a
I set the watchdog check hours to 1 and 12 hours and in both cases the watchdog works fine but the reset are more frequent than intended.
In my mind i thought that setting 1 in check hours means 1 reset each hour if the ping does not work on the specified address or once every 12 hours it i set it to 12.
The results is many resets with an uptime always updated.

Concerning the compatibility with the charts i noted that if i erase the log file on the SD card of the day, set the watchdog to OFF, powercycle OCS, set watchdog to ON then the charts would work during one day.
But at 12 pm when OCS generates a new log file if the watchdog is ON it will generate a 1 KB log file, a kind of corrupted file that won't let OCS write data line lines in it.
Howard, is it possible to insert a condition so that at 12 pm OCS switch off the watchdog to let it generate a sane file  and switch it ON again when it is created ?
No sure that it is the best solution but it would be an alternate way the time i find why my system has this behaviour ?

Thank you !
Koal01

Thank you !


Howard Dutton
 
Edited

On Mon, Jul 6, 2020 at 07:45 AM, koal01 wrote:
I did some tests this week-end on the new version 2.2 a
I set the watchdog check hours to 1 and 12 hours and in both cases the watchdog works fine but the reset are more frequent than intended.
In my mind i thought that setting 1 in check hours means 1 reset each hour if the ping does not work on the specified address or once every 12 hours it i set it to 12.
The results is many resets with an uptime always updated.
I don't recall seeing that happening.  Reviewed the code too, looks good.  You do have DEBUG_WATCHDOG mode OFF, right?

Concerning the compatibility with the charts i noted that if i erase the log file on the SD card of the day, set the watchdog to OFF, powercycle OCS, set watchdog to ON then the charts would work during one day.
But at 12 pm when OCS generates a new log file if the watchdog is ON it will generate a 1 KB log file, a kind of corrupted file that won't let OCS write data line lines in it.
Howard, is it possible to insert a condition so that at 12 pm OCS switch off the watchdog to let it generate a sane file  and switch it ON again when it is created ?
No sure that it is the best solution but it would be an alternate way the time i find why my system has this behaviour ?
I patched for this bug, 90% sure I got it.  Writing the blank log file takes > 8 seconds and the watchdog resets the Mega2560.