Announcement

Collapse
No announcement yet.

System lags after network changes

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • System lags after network changes

    I made some annoying discovery. When changing the network settings, the system seems to "hang" for about quarter a minute. I use the XPort-Pro with Evolution SDK V5.2.0.0. The changes of the network settings are done using the "XMLImport()"-function.

    My application contains a webpage that shows the current network settings. So, there is a HTTP callback function that reads the network settings via XML interface and writes them to the HTTP client. Just reading this settings works reliable within several hundreds of milliseconds.

    But: When changing these options - to be more precise: the DHCP setting from "enable" to "disable" - the system lags. When looking at "WebManager -> Diagnostics -> Processes", there is a peak of 100% CPU load. Furthermore, the HTTP callback function that writes the XML configuration does not finish within few seconds, but lasts 15 to 20 seconds... So it put in several TLOG()-calls to find the source of the problem. Suprisingly, there are time gaps of several seconds between two following TLOG() calls (without any other operations within them).

    Using a network sniffer, I found out that the HTTP call finishes after about 20 seconds - 9 seconds after my last TLOG() call right in front of "return true;". The only questionable thing are ARP calls to resolve the MAC address of IP "0.0.0.0"; I do not know the ARP protocol, maybe it is a usual practice.

    Any ideas where to look for a solution?

    EDIT: I forgot about one important fact: The lag seems to appear only the first time when I change this setting.
    Last edited by TESK; 12-09-2010, 03:16 PM.

  • #2
    DHCP disable causing spin loop

    I have tried to witness this with just our standard product, but it works okay when importing XML just doing the disable and enable of DHCP. No change actually takes effect without a reboot.

    If you can take us through a series of steps with the standard product (no SDK code) that produces the behavior, that would be the most direct way to get to the root of this.

    If not, the behavior could be an interaction with your code. To help debug this, the "show task states" CLI command in R20 provides call stack information which, in combination with your link map, could help to identify where in the code the spin loop actually is.

    The ARP request with a zero address does not sound right. I am not seeing that happen on my device.

    Comment


    • #3
      To clarify the edit of my last post: The lag only appears the first time, when the DHCP client is disabled after booting the system. Further changes of the setting are "fast" (or as least as fast as the usual XML access).

      I've created a small project that only contains a minimalistic implementation that covers this phenomenon. It contains an entry function that registers one HTTP callback function. This callback function contains (the timestamp in bracket shows the relative time to the first TCP data of the HTTP request):
      • syslog entry (15 ms)
      • read the whole configuration for eth0 using XMLReadValue_malloc()
      • set the DHCP client to "disable" in the locally stored data
      • syslog entry (67 ms)
      • generate an XML string out of the local configuration with the whole "configgroup" and write it back to the device using XMLImportFromStream()
      • syslog entry (142 ms)
      • (last http data of the HTTP response: 4240 ms)


      So this takes "only" about 4 seconds to complete (means: the web browser says "ready"). The difference to my first post shows the time for other files that are included in the HTML construct. In this minimalistic project, the answer contains just plain text.

      Using a network sniffer, I see again this arp message to 0.0.0.0. I am using a network with just two members: the XPort-Pro device (IP 192.168.2.11) and the NIC of my PC (IP: 192.168.2.2). Right in between the syslog entries of writing the network configuration, I see two ARP telegrams on the line:
      1. ARP from XPort to the IP of the PC (with MAC 00:00:00:00:00:00) requesting the MAC of 192.168.2.2 (op code 0x0001); the PC answers with op code 0x0002 and the MAC address (looks okay to me)
      2. ARP from XPort to the MAC of my PC requesting the MAC address of IP address 0.0.0.0; here the PC does not respond.


      When trying to get the CLI via telnet, I experienced the same lag as described. So I used the CLI via serial line. There are two tasks with a "higher" tick number:
      5 TIME DNS Cache -> Delay: (dns.c 1490) [4267259 ticks]
      8 TIME Dynamic IP Daemon (eth0) -> Delay: (dhcp.c 1313) [4270884 ticks]
      All other tasks show a tick number from 0 to 10000. I suppose, that the tick numbers are independent of my actions: There are no big differences when writing the network config.

      I hope, this helps a litte bit.

      Comment


      • #4
        Clarification

        It does sound like you are homing in to something. Unfortunately, I still do not have enough to reproduce any problem. I suggest that you contact your Field Application Engineer, and attempt to provide enough information and instructions so that he or she can actually replicate your problem.

        Comment


        • #5
          Unfortunately, my field application engineer told me that he is too busy at the moment. Therefore I tried to write a small example that shows this phenomenon.

          Here some important steps:
          • Take a "fresh" XPort-Pro device (or: restore factory defaults)
          • With WebManager: enable syslog ("debug") and forward the messages to your syslog deamon
          • With WebManager: disable DHCP client and use a static ip and gateway
          • Build an image using the supplied code, and other files from the samples (config.h, startup.c, xport_pro.ld, ...) using Evolution V5.2.0.0R22 and the library with WebManager
          • Load the image via FTP
          • Wait for the device rebooting
          • Start network sniffer on your PC
          • Start syslog deamon on your PC
          • Call web address "http://[IP]/dhcpon"


          My example application:
          Code:
          #include <evolution.h>
          #include <evolution_stdio.h>
          #include <evolution_xml.h>
          #include <evolution_http.h>
          #include <evolution_tlog.h>
          #include <startup.h>
          
          #define XML_BUF_SIZE  (16384)
          #define ETH0_IP       "192.168.2.11"
          #define ETH0_GATEWAY  "192.168.2.2"
          
          
          /**
           * Error handler for XMLImport function
           */
          void xmlErrorHandler(const char* chMsg, void* opaque)
          {
            (void) opaque;
          
            TLOG(TLOG_SEVERITY_ERR, "XML import error");
            TLOG(TLOG_SEVERITY_ERR, chMsg);
          }
          
          
          /**
           * Turn on/off DHCP client via XMLImport function
           */
          void setDhcpState(struct http_client* sHttpClient, bool boEnabled)
          {
            struct input_stream_with_const_char_ptr sXmlStream;
            char chBuffer[200], * chXml;
            uint8_t  bCounter;
            uint32_t lTimestamp;
          
            // display elapsed time relative to this timestamp
            lTimestamp = OSTimemarkMS();
            (void) snprintf(chBuffer, sizeof(chBuffer), "%5lu: setDhcpState(%s)\r\n",
              OSElapsedTimeCurrentMS(lTimestamp), boEnabled ? "on" : "off");
            (void) HttpWriteData(sHttpClient, chBuffer);
          
            // malloc "some" memory for XML description
            chXml = MALLOC_ZEROED(XML_BUF_SIZE);
            if (chXml == NULL)
            {
              (void) snprintf(chBuffer, sizeof(chBuffer), "%5lu: malloc failed\r\n",
                OSElapsedTimeCurrentMS(lTimestamp));
              (void) HttpWriteData(sHttpClient, chBuffer);
              return;
            }
          
            // build XML string (suppose there is enough space within chXml)
            (void) strcpy(chXml, "<?xml version=\"1.0\" standalone=\"yes\"?>"
              "<configrecord version=\"0.1.0.0T0\">"
              "<configgroup name=\"interface\" instance=\"eth0\">"
              "<configitem name=\"bootp\"><value>disable</value></configitem>");
            if (boEnabled)
            {
              (void) strcat(chXml, "<configitem name=\"dhcp\"><value>enable</value></configitem>"
                "<configitem name=\"ip address\"><value>&amp;#60;None&amp;#62;</value></configitem>"
                "<configitem name=\"default gateway\"><value>&amp;#60;None&amp;#62;</value></configitem>");
            }
            else
            {
              (void) strcat(chXml, "<configitem name=\"dhcp\"><value>disable</value></configitem>"
                "<configitem name=\"ip address\"><value>" ETH0_IP "</value></configitem>"
                "<configitem name=\"default gateway\"><value>" ETH0_GATEWAY "</value></configitem>");
            }
            strcat(chXml, "<configitem name=\"hostname\"><value></value></configitem>"
              "<configitem name=\"domain\"><value></value></configitem>"
              "<configitem name=\"dhcp client id\"><value></value></configitem>"
              "<configitem name=\"primary dns\"><value>&amp;#60;None&amp;#62;</value></configitem>"
              "<configitem name=\"secondary dns\"><value>&amp;#60;None&amp;#62;</value></configitem>"
              "<configitem name=\"mtu\"><value>1500 bytes</value></configitem>"
              "</configgroup>"
              "</configrecord>");
          
          
            // Import XML settings
            if (StreamInitWithConstCharPtr(&sXmlStream, chXml))
            { // ok
              (void) snprintf(chBuffer, sizeof(chBuffer), "%5lu: starting XML import\r\n",
                OSElapsedTimeCurrentMS(lTimestamp));
              (void) HttpWriteData(sHttpClient, chBuffer);
          
              if (!XMLImportFromStream(&sXmlStream.inStream, xmlErrorHandler, NULL))
              {
                (void) snprintf(chBuffer, sizeof(chBuffer), "%5lu: XML import failed\r\n",
                  OSElapsedTimeCurrentMS(lTimestamp));
                (void) HttpWriteData(sHttpClient, chBuffer);
              }
          
              (void) snprintf(chBuffer, sizeof(chBuffer), "%5lu: XML import finished\r\n",
                OSElapsedTimeCurrentMS(lTimestamp));
              (void) HttpWriteData(sHttpClient, chBuffer);
            }
            else
            { // failed
              (void) snprintf(chBuffer, sizeof(chBuffer), "%5lu: streamInit failed\r\n",
                OSElapsedTimeCurrentMS(lTimestamp));
              (void) HttpWriteData(sHttpClient, chBuffer);
            }
          
            // free XML buffer
            Free(chXml);
          
            // print some stuff about every 100 ms
            for (bCounter = 0; bCounter < 20; bCounter++)
            {
              (void) snprintf(chBuffer, sizeof(chBuffer), "%5lu: end of setDhcpState %u\r\n",
                OSElapsedTimeCurrentMS(lTimestamp), bCounter);
              (void) HttpWriteData(sHttpClient, chBuffer);
          
              // delay for about 100 ms
              DELAY_THREAD(0, DLY_MSECS, 100);
            }
          
          }
          
          
          /**
           * HTTP callback functions to turn on/off DHCP client
           */
          bool callbackFcnOn(struct http_client* sHttpClient)
          {
            TLOG(TLOG_SEVERITY_DEBUG, "callbackFcnOn start");
            setDhcpState(sHttpClient, true);
            TLOG(TLOG_SEVERITY_DEBUG, "callbackFcnOn end");
            return true;
          }
          
          bool callbackFcnOff(struct http_client* sHttpClient)
          {
            TLOG(TLOG_SEVERITY_DEBUG, "callbackFcnOff start");
            setDhcpState(sHttpClient, false);
            TLOG(TLOG_SEVERITY_DEBUG, "callbackFcnOff end");
            return true;
          }
          
          
          /**
           * Register two callback functions for "/dhcpon" and "/dhcpoff"
           */
          int SDKMain(struct command_line *commandLine, struct http_server *httpServer)
          {
            (void) commandLine;
          
            if (!HttpDynamicCallbackAdd (httpServer, "/dhcpon", callbackFcnOn))
            {
              TLOG(TLOG_SEVERITY_CRIT, "registerCallback: /dhcpon");
            }
          
            if (!HttpDynamicCallbackAdd (httpServer, "/dhcpoff", callbackFcnOff))
            {
              TLOG(TLOG_SEVERITY_CRIT, "registerCallback: /dhcpoff");
            }
          
            return 0;
          }
          Here my results:

          Syslog (with relative time in ms from first message):
          • 0: "main.c: callbackFcnOn start"
          • 2110: "main.c: callbackFcnOn end"


          HTTP data from web browser (timestamps seem ok, but output "lags"):
          • 0: setDhcpState(on)
          • 2: starting XML import
          • 111: XML import finished
          • 111: end of setDhcpState 0
          • 211: end of setDhcpState 1
          • 311: end of setDhcpState 2
          • ... (shortened)
          • 1911: end of setDhcpState 18
          • 2011: end of setDhcpState 19


          Finally important events from the network sniffer (again relative timestamps from the first sniffed paket):
          • 0: GET /dhcpon
          • 4: syslog-msg: "main.c: callbackFcnOn start"
          • 84: arp request from XPort to Broadcast (requesting 192.168.2.2: PC)
          • 84: arp response from PC with MAC
          • 214: arp request from XPort to Broadcast (requesting 0.0.0.0)
          • 2115: syslog-msg: "main.c: callbackFcnOn end"
          • ... (shortened)
          • 5115: XPort sends HTTP data containing: "end of setDhcpState 3" till "end of setDhcpState 12"
          • 9615: XPort sends last data from request containing: "end of setDhcpState 13" till "end of setDhcpState 19"


          It would be great, if someone could try to reproduce this.

          Comment


          • #6
            Any updates for the described problem (year 2011)?

            I watched another occurence of an ARP packet to "0.0.0.0". A TCP connection from the PC to the XPP (Evolution OS V5.2.1.0 B8) HTTP server has been established (SYN, SYN/ACK, ACK). The PC sends data to the XPP, which are not ACK'ed. Instead, the XPP sends:
            Code:
            10:17:48.427	XPP -> Broadcast	ARP	Who has 0.0.0.0?  Tell IP.IP.IP.IP
            What can be the reason for this ARP call or the missing of the TCP ACK? The dropped TCP packets are retransmitted after two seconds and the HTTP request finishes.

            Comment

            Working...
            X