Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Maximum UDP transmit rate is just 50 MB/s and with 100% CPU usage!??

Maximum UDP transmit rate is just 50 MB/s and with 100% CPU usage!??

Scheduled Pinned Locked Moved C / C++ / MFC
questionasp-netsysadminhelp
18 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C clayman87

    Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

    SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

    sockaddr_in targetAddr;
    targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
    targetAddr.sin_family = AF_INET;
    targetAddr.sin_port = htons( 1337 );

    char arr[1400];
    long long sent = 0;
    while( !kbhit() )
    {
    for( int i=0; i<1000; ++i )
    {
    long res;
    if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
    {
    printf("Error: %d\n", WSAGetLastError() );
    return -1;
    }
    sent += res;
    }

    printf("\\r%d MBs sent", (long)(sent >> 20) );
    

    }

    When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

    A Offline
    A Offline
    Adam Roderick J
    wrote on last edited by
    #4

    1. For Gigabyte transfer, firstly CPU speed is a big factor, so may i know, whats the speed of the CPU.

    C 1 Reply Last reply
    0
    • A Adam Roderick J

      1. For Gigabyte transfer, firstly CPU speed is a big factor, so may i know, whats the speed of the CPU.

      C Offline
      C Offline
      clayman87
      wrote on last edited by
      #5

      Thank you for your question. Test machine was a Intel P35 chipset-based system, Core 2 Duo E6400 2133 MHz CPU, 2GB of RAM, Gigabyte P35-DS3 motherboard.

      A 1 Reply Last reply
      0
      • C clayman87

        Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

        SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

        sockaddr_in targetAddr;
        targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
        targetAddr.sin_family = AF_INET;
        targetAddr.sin_port = htons( 1337 );

        char arr[1400];
        long long sent = 0;
        while( !kbhit() )
        {
        for( int i=0; i<1000; ++i )
        {
        long res;
        if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
        {
        printf("Error: %d\n", WSAGetLastError() );
        return -1;
        }
        sent += res;
        }

        printf("\\r%d MBs sent", (long)(sent >> 20) );
        

        }

        When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

        K Offline
        K Offline
        killabyte
        wrote on last edited by
        #6

        http://www.wireshark.org/[^] for all your network debuggin needs. this is honestly the best thing since sliced bread for networking tools

        C 1 Reply Last reply
        0
        • K killabyte

          http://www.wireshark.org/[^] for all your network debuggin needs. this is honestly the best thing since sliced bread for networking tools

          C Offline
          C Offline
          clayman87
          wrote on last edited by
          #7

          I took your advice, and did some testing in Wireshark. Based on "Identification" field in the IP header, which seems to be assigned continuously, everything looks fine down to bottom of the IP layer, but then Wireshark only registers every fifth or so packets as Ethernet frames, which correlaltes well with the fact that a ~60 MB/s datastreams goes down the pipe with a capacity of a 12.5 MB/s (100mbit). As I've learned, this layer is usually implemented in the network card driver. A poorly written driver software of interface specification may be the cause of this "no block" problem. As for the transfer rate, I'm still puzzled.

          1 Reply Last reply
          0
          • C clayman87

            Thank you for your question. Test machine was a Intel P35 chipset-based system, Core 2 Duo E6400 2133 MHz CPU, 2GB of RAM, Gigabyte P35-DS3 motherboard.

            A Offline
            A Offline
            Adam Roderick J
            wrote on last edited by
            #8

            CPU is not the problem here, here the high speed bus to interface to your NIC is the problem. Check the speed of the bus, it has lot of do. If the bus cannot support more than 100-200 MB/s then we cannot do anything. :(

            C 1 Reply Last reply
            0
            • A Adam Roderick J

              CPU is not the problem here, here the high speed bus to interface to your NIC is the problem. Check the speed of the bus, it has lot of do. If the bus cannot support more than 100-200 MB/s then we cannot do anything. :(

              C Offline
              C Offline
              clayman87
              wrote on last edited by
              #9

              It is on a PCI-E 1.0 lane, which is 250 MB/s, it should be, and it is enough on Ubuntu 8.10 (at least for 1 gigabit). The question is, what causes the 100% CPU load on Windows XP even at 60MB/s, and on Ubuntu at 130 MB/s.

              1 Reply Last reply
              0
              • C clayman87

                Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

                SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

                sockaddr_in targetAddr;
                targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
                targetAddr.sin_family = AF_INET;
                targetAddr.sin_port = htons( 1337 );

                char arr[1400];
                long long sent = 0;
                while( !kbhit() )
                {
                for( int i=0; i<1000; ++i )
                {
                long res;
                if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
                {
                printf("Error: %d\n", WSAGetLastError() );
                return -1;
                }
                sent += res;
                }

                printf("\\r%d MBs sent", (long)(sent >> 20) );
                

                }

                When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

                D Offline
                D Offline
                David Crow
                wrote on last edited by
                #10

                clayman87 wrote:

                Why does sendto() then reports having sent the data...

                One of the fundamental differences between UDP and TCP is that UDP is an unreliable transport service because it does not guarantee data packet delivery and no notification is sent if a packet is not delivered.

                "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                C 1 Reply Last reply
                0
                • D David Crow

                  clayman87 wrote:

                  Why does sendto() then reports having sent the data...

                  One of the fundamental differences between UDP and TCP is that UDP is an unreliable transport service because it does not guarantee data packet delivery and no notification is sent if a packet is not delivered.

                  "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                  "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                  C Offline
                  C Offline
                  clayman87
                  wrote on last edited by
                  #11

                  Let me rephrase my question. MSDN states: "If no buffer space is available within the transport system to hold the data to be transmitted, sendto will block unless the socket has been placed in a nonblocking mode". In the protocol stack, at which layer is this buffer space located?

                  1 Reply Last reply
                  0
                  • C clayman87

                    Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

                    SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

                    sockaddr_in targetAddr;
                    targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
                    targetAddr.sin_family = AF_INET;
                    targetAddr.sin_port = htons( 1337 );

                    char arr[1400];
                    long long sent = 0;
                    while( !kbhit() )
                    {
                    for( int i=0; i<1000; ++i )
                    {
                    long res;
                    if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
                    {
                    printf("Error: %d\n", WSAGetLastError() );
                    return -1;
                    }
                    sent += res;
                    }

                    printf("\\r%d MBs sent", (long)(sent >> 20) );
                    

                    }

                    When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

                    M Offline
                    M Offline
                    Moak
                    wrote on last edited by
                    #12

                    Alternatively consider TCP for bulk data transfer. With UDP you would also need to do more housekeeping, like throwing away duplicate packages and resending missing ones. With TCP it could be good to use a large send buffer (64KB or more per sending socket). This can improve the throughput because the kernel will have data available as soon as it can send more, without waiting or notifying user space application. Hope it helps.

                    My webchat in Europe :java: (in 4K)

                    C 1 Reply Last reply
                    0
                    • M Moak

                      Alternatively consider TCP for bulk data transfer. With UDP you would also need to do more housekeeping, like throwing away duplicate packages and resending missing ones. With TCP it could be good to use a large send buffer (64KB or more per sending socket). This can improve the throughput because the kernel will have data available as soon as it can send more, without waiting or notifying user space application. Hope it helps.

                      My webchat in Europe :java: (in 4K)

                      C Offline
                      C Offline
                      clayman87
                      wrote on last edited by
                      #13

                      As far as I know, there's no way to establish a TCP connection between two passive (behind NAT) clients, which is partly what I want to accomplish.

                      M 1 Reply Last reply
                      0
                      • C clayman87

                        As far as I know, there's no way to establish a TCP connection between two passive (behind NAT) clients, which is partly what I want to accomplish.

                        M Offline
                        M Offline
                        Moak
                        wrote on last edited by
                        #14

                        clayman87 wrote:

                        As far as I know, there's no way to establish a TCP connection between two passive (behind NAT) clients, which is partly what I want to accomplish.

                        Peer to peer... no, but maybe with a peer in the middle (one needs to be able to accept incoming connections).

                        My webchat in Europe :java: (in 4K)

                        C 1 Reply Last reply
                        0
                        • M Moak

                          clayman87 wrote:

                          As far as I know, there's no way to establish a TCP connection between two passive (behind NAT) clients, which is partly what I want to accomplish.

                          Peer to peer... no, but maybe with a peer in the middle (one needs to be able to accept incoming connections).

                          My webchat in Europe :java: (in 4K)

                          C Offline
                          C Offline
                          clayman87
                          wrote on last edited by
                          #15

                          Yes, a third party is inherently needed to establish a connection (STUN server), but relaying all the traffic through it is just not an option. As I know of no way to do this separately (handshake vs. traffic) with TCP, I was forced to switch to UDP.

                          M 1 Reply Last reply
                          0
                          • C clayman87

                            Yes, a third party is inherently needed to establish a connection (STUN server), but relaying all the traffic through it is just not an option. As I know of no way to do this separately (handshake vs. traffic) with TCP, I was forced to switch to UDP.

                            M Offline
                            M Offline
                            Moak
                            wrote on last edited by
                            #16

                            Exploring the alternatives... since the routers between LAN and internet are the problem, how about Zeroconf/UPnP (or another application level protocol) that would allow incoming TCP data streams. I am not sure if it is an option in reality.

                            My webchat in Europe :java: (in 4K)

                            C 1 Reply Last reply
                            0
                            • M Moak

                              Exploring the alternatives... since the routers between LAN and internet are the problem, how about Zeroconf/UPnP (or another application level protocol) that would allow incoming TCP data streams. I am not sure if it is an option in reality.

                              My webchat in Europe :java: (in 4K)

                              C Offline
                              C Offline
                              clayman87
                              wrote on last edited by
                              #17

                              Well, I didn't want to go that way, but I'm going to be out of options very soon, it seems. Thanks for the replies.

                              1 Reply Last reply
                              0
                              • C clayman87

                                Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

                                SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

                                sockaddr_in targetAddr;
                                targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
                                targetAddr.sin_family = AF_INET;
                                targetAddr.sin_port = htons( 1337 );

                                char arr[1400];
                                long long sent = 0;
                                while( !kbhit() )
                                {
                                for( int i=0; i<1000; ++i )
                                {
                                long res;
                                if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
                                {
                                printf("Error: %d\n", WSAGetLastError() );
                                return -1;
                                }
                                sent += res;
                                }

                                printf("\\r%d MBs sent", (long)(sent >> 20) );
                                

                                }

                                When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

                                J Offline
                                J Offline
                                Jim_Pen
                                wrote on last edited by
                                #18

                                Is it possible that your Ethernet hardware does not have a controller? Should not PHY only adapters use CPU to move data?

                                1 Reply Last reply
                                0
                                Reply
                                • Reply as topic
                                Log in to reply
                                • Oldest to Newest
                                • Newest to Oldest
                                • Most Votes


                                • Login

                                • Don't have an account? Register

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • World
                                • Users
                                • Groups