Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Maximum UDP transmit rate is just 50 MB/s and with 100% CPU usage!??

Maximum UDP transmit rate is just 50 MB/s and with 100% CPU usage!??

Scheduled Pinned Locked Moved C / C++ / MFC
questionasp-netsysadminhelp
18 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C clayman87

    Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

    SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

    sockaddr_in targetAddr;
    targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
    targetAddr.sin_family = AF_INET;
    targetAddr.sin_port = htons( 1337 );

    char arr[1400];
    long long sent = 0;
    while( !kbhit() )
    {
    for( int i=0; i<1000; ++i )
    {
    long res;
    if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
    {
    printf("Error: %d\n", WSAGetLastError() );
    return -1;
    }
    sent += res;
    }

    printf("\\r%d MBs sent", (long)(sent >> 20) );
    

    }

    When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

    L Offline
    L Offline
    Lost User
    wrote on last edited by
    #2

    Do you have a firewall?

    C 1 Reply Last reply
    0
    • L Lost User

      Do you have a firewall?

      C Offline
      C Offline
      clayman87
      wrote on last edited by
      #3

      Yes, I do, but I sent the program to some fearless friends without firewall, and they experienced similar results on Windows XP. Meanwhile, I tested the code on other OS-es as well. So the measurements are for 1400 byte UDP daragrams: - Windows XP SP 3 (w/ fw): 66 MB/s, 100% CPU-usage/core (mostly kernel) - Windows 7 RC1 (no firewall): 11.5 MB/s, but only 20% CPU-usage/core, seems to take interface capacity into account (which is, after all, what I would expect in the first place) - Ubuntu 8.10 on VMware with Tools: 120 MB/s, 100% CPU-usage/core (mostly kernel) In contrast, I've been able to push data through a loopback TCP connection at 330 MB/s on WindowsXP/CLR, and nearly 500MB/s on Ubuntu.

      modified on Tuesday, July 7, 2009 11:13 PM

      1 Reply Last reply
      0
      • C clayman87

        Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

        SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

        sockaddr_in targetAddr;
        targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
        targetAddr.sin_family = AF_INET;
        targetAddr.sin_port = htons( 1337 );

        char arr[1400];
        long long sent = 0;
        while( !kbhit() )
        {
        for( int i=0; i<1000; ++i )
        {
        long res;
        if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
        {
        printf("Error: %d\n", WSAGetLastError() );
        return -1;
        }
        sent += res;
        }

        printf("\\r%d MBs sent", (long)(sent >> 20) );
        

        }

        When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

        A Offline
        A Offline
        Adam Roderick J
        wrote on last edited by
        #4

        1. For Gigabyte transfer, firstly CPU speed is a big factor, so may i know, whats the speed of the CPU.

        C 1 Reply Last reply
        0
        • A Adam Roderick J

          1. For Gigabyte transfer, firstly CPU speed is a big factor, so may i know, whats the speed of the CPU.

          C Offline
          C Offline
          clayman87
          wrote on last edited by
          #5

          Thank you for your question. Test machine was a Intel P35 chipset-based system, Core 2 Duo E6400 2133 MHz CPU, 2GB of RAM, Gigabyte P35-DS3 motherboard.

          A 1 Reply Last reply
          0
          • C clayman87

            Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

            SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

            sockaddr_in targetAddr;
            targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
            targetAddr.sin_family = AF_INET;
            targetAddr.sin_port = htons( 1337 );

            char arr[1400];
            long long sent = 0;
            while( !kbhit() )
            {
            for( int i=0; i<1000; ++i )
            {
            long res;
            if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
            {
            printf("Error: %d\n", WSAGetLastError() );
            return -1;
            }
            sent += res;
            }

            printf("\\r%d MBs sent", (long)(sent >> 20) );
            

            }

            When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

            K Offline
            K Offline
            killabyte
            wrote on last edited by
            #6

            http://www.wireshark.org/[^] for all your network debuggin needs. this is honestly the best thing since sliced bread for networking tools

            C 1 Reply Last reply
            0
            • K killabyte

              http://www.wireshark.org/[^] for all your network debuggin needs. this is honestly the best thing since sliced bread for networking tools

              C Offline
              C Offline
              clayman87
              wrote on last edited by
              #7

              I took your advice, and did some testing in Wireshark. Based on "Identification" field in the IP header, which seems to be assigned continuously, everything looks fine down to bottom of the IP layer, but then Wireshark only registers every fifth or so packets as Ethernet frames, which correlaltes well with the fact that a ~60 MB/s datastreams goes down the pipe with a capacity of a 12.5 MB/s (100mbit). As I've learned, this layer is usually implemented in the network card driver. A poorly written driver software of interface specification may be the cause of this "no block" problem. As for the transfer rate, I'm still puzzled.

              1 Reply Last reply
              0
              • C clayman87

                Thank you for your question. Test machine was a Intel P35 chipset-based system, Core 2 Duo E6400 2133 MHz CPU, 2GB of RAM, Gigabyte P35-DS3 motherboard.

                A Offline
                A Offline
                Adam Roderick J
                wrote on last edited by
                #8

                CPU is not the problem here, here the high speed bus to interface to your NIC is the problem. Check the speed of the bus, it has lot of do. If the bus cannot support more than 100-200 MB/s then we cannot do anything. :(

                C 1 Reply Last reply
                0
                • A Adam Roderick J

                  CPU is not the problem here, here the high speed bus to interface to your NIC is the problem. Check the speed of the bus, it has lot of do. If the bus cannot support more than 100-200 MB/s then we cannot do anything. :(

                  C Offline
                  C Offline
                  clayman87
                  wrote on last edited by
                  #9

                  It is on a PCI-E 1.0 lane, which is 250 MB/s, it should be, and it is enough on Ubuntu 8.10 (at least for 1 gigabit). The question is, what causes the 100% CPU load on Windows XP even at 60MB/s, and on Ubuntu at 130 MB/s.

                  1 Reply Last reply
                  0
                  • C clayman87

                    Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

                    SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

                    sockaddr_in targetAddr;
                    targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
                    targetAddr.sin_family = AF_INET;
                    targetAddr.sin_port = htons( 1337 );

                    char arr[1400];
                    long long sent = 0;
                    while( !kbhit() )
                    {
                    for( int i=0; i<1000; ++i )
                    {
                    long res;
                    if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
                    {
                    printf("Error: %d\n", WSAGetLastError() );
                    return -1;
                    }
                    sent += res;
                    }

                    printf("\\r%d MBs sent", (long)(sent >> 20) );
                    

                    }

                    When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

                    D Offline
                    D Offline
                    David Crow
                    wrote on last edited by
                    #10

                    clayman87 wrote:

                    Why does sendto() then reports having sent the data...

                    One of the fundamental differences between UDP and TCP is that UDP is an unreliable transport service because it does not guarantee data packet delivery and no notification is sent if a packet is not delivered.

                    "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                    "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                    C 1 Reply Last reply
                    0
                    • D David Crow

                      clayman87 wrote:

                      Why does sendto() then reports having sent the data...

                      One of the fundamental differences between UDP and TCP is that UDP is an unreliable transport service because it does not guarantee data packet delivery and no notification is sent if a packet is not delivered.

                      "Old age is like a bank account. You withdraw later in life what you have deposited along the way." - Unknown

                      "Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

                      C Offline
                      C Offline
                      clayman87
                      wrote on last edited by
                      #11

                      Let me rephrase my question. MSDN states: "If no buffer space is available within the transport system to hold the data to be transmitted, sendto will block unless the socket has been placed in a nonblocking mode". In the protocol stack, at which layer is this buffer space located?

                      1 Reply Last reply
                      0
                      • C clayman87

                        Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

                        SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

                        sockaddr_in targetAddr;
                        targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
                        targetAddr.sin_family = AF_INET;
                        targetAddr.sin_port = htons( 1337 );

                        char arr[1400];
                        long long sent = 0;
                        while( !kbhit() )
                        {
                        for( int i=0; i<1000; ++i )
                        {
                        long res;
                        if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
                        {
                        printf("Error: %d\n", WSAGetLastError() );
                        return -1;
                        }
                        sent += res;
                        }

                        printf("\\r%d MBs sent", (long)(sent >> 20) );
                        

                        }

                        When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

                        M Offline
                        M Offline
                        Moak
                        wrote on last edited by
                        #12

                        Alternatively consider TCP for bulk data transfer. With UDP you would also need to do more housekeeping, like throwing away duplicate packages and resending missing ones. With TCP it could be good to use a large send buffer (64KB or more per sending socket). This can improve the throughput because the kernel will have data available as soon as it can send more, without waiting or notifying user space application. Hope it helps.

                        My webchat in Europe :java: (in 4K)

                        C 1 Reply Last reply
                        0
                        • M Moak

                          Alternatively consider TCP for bulk data transfer. With UDP you would also need to do more housekeeping, like throwing away duplicate packages and resending missing ones. With TCP it could be good to use a large send buffer (64KB or more per sending socket). This can improve the throughput because the kernel will have data available as soon as it can send more, without waiting or notifying user space application. Hope it helps.

                          My webchat in Europe :java: (in 4K)

                          C Offline
                          C Offline
                          clayman87
                          wrote on last edited by
                          #13

                          As far as I know, there's no way to establish a TCP connection between two passive (behind NAT) clients, which is partly what I want to accomplish.

                          M 1 Reply Last reply
                          0
                          • C clayman87

                            As far as I know, there's no way to establish a TCP connection between two passive (behind NAT) clients, which is partly what I want to accomplish.

                            M Offline
                            M Offline
                            Moak
                            wrote on last edited by
                            #14

                            clayman87 wrote:

                            As far as I know, there's no way to establish a TCP connection between two passive (behind NAT) clients, which is partly what I want to accomplish.

                            Peer to peer... no, but maybe with a peer in the middle (one needs to be able to accept incoming connections).

                            My webchat in Europe :java: (in 4K)

                            C 1 Reply Last reply
                            0
                            • M Moak

                              clayman87 wrote:

                              As far as I know, there's no way to establish a TCP connection between two passive (behind NAT) clients, which is partly what I want to accomplish.

                              Peer to peer... no, but maybe with a peer in the middle (one needs to be able to accept incoming connections).

                              My webchat in Europe :java: (in 4K)

                              C Offline
                              C Offline
                              clayman87
                              wrote on last edited by
                              #15

                              Yes, a third party is inherently needed to establish a connection (STUN server), but relaying all the traffic through it is just not an option. As I know of no way to do this separately (handshake vs. traffic) with TCP, I was forced to switch to UDP.

                              M 1 Reply Last reply
                              0
                              • C clayman87

                                Yes, a third party is inherently needed to establish a connection (STUN server), but relaying all the traffic through it is just not an option. As I know of no way to do this separately (handshake vs. traffic) with TCP, I was forced to switch to UDP.

                                M Offline
                                M Offline
                                Moak
                                wrote on last edited by
                                #16

                                Exploring the alternatives... since the routers between LAN and internet are the problem, how about Zeroconf/UPnP (or another application level protocol) that would allow incoming TCP data streams. I am not sure if it is an option in reality.

                                My webchat in Europe :java: (in 4K)

                                C 1 Reply Last reply
                                0
                                • M Moak

                                  Exploring the alternatives... since the routers between LAN and internet are the problem, how about Zeroconf/UPnP (or another application level protocol) that would allow incoming TCP data streams. I am not sure if it is an option in reality.

                                  My webchat in Europe :java: (in 4K)

                                  C Offline
                                  C Offline
                                  clayman87
                                  wrote on last edited by
                                  #17

                                  Well, I didn't want to go that way, but I'm going to be out of options very soon, it seems. Thanks for the replies.

                                  1 Reply Last reply
                                  0
                                  • C clayman87

                                    Hi! I've been experimenting with custom flow control techniques for bulk transfers over UDP when I discovered something very weird. Please take a look at the following code: it just sends UDP datagrams of size 1400 bytes in an endless loop to some IP address.

                                    SOCKET sock = socket( AF_INET, SOCK_DGRAM, IPPROTO_UDP );

                                    sockaddr_in targetAddr;
                                    targetAddr.sin_addr.s_addr = inet_addr( "...some IP..." );
                                    targetAddr.sin_family = AF_INET;
                                    targetAddr.sin_port = htons( 1337 );

                                    char arr[1400];
                                    long long sent = 0;
                                    while( !kbhit() )
                                    {
                                    for( int i=0; i<1000; ++i )
                                    {
                                    long res;
                                    if( (res=sendto( sock, arr, 1400, 0, (sockaddr*)&targetAddr, sizeof( targetAddr ) )) == SOCKET_ERROR )
                                    {
                                    printf("Error: %d\n", WSAGetLastError() );
                                    return -1;
                                    }
                                    sent += res;
                                    }

                                    printf("\\r%d MBs sent", (long)(sent >> 20) );
                                    

                                    }

                                    When I run the program, every sendto() call succeeds and reports having sent 1400 bytes of data. The interesting thing is that I get a transfer rate of just about 50 MB/s but 100% CPU usage on one core (mostly kernel-mode). Now: -- my computer is connected to an Ethernet 100BaseTX network, which obviously does not support the transfer rate above, so datagrams get already lost before even reaching the network. Why does sendto() then reports having sent the data, what is more, why does it not block when I/O buffers fills up? (The documentation says that it should.) -- how on Earth can someone utilize the full potential of a - say - Gigabit Ethernet network if just sending data even at half of its capacity already causes maximum CPU load? So, what am I doing wrong, why on Earth does sendto() takes so long? Any suggestion is very welcome, thanks, clayman P.S.: I've run a test with 140 bytes of data each time, and the transfer rate basically dropped to 5 MB/s -- so the _number_ of sendto() calls seems to be the bottle-neck.

                                    J Offline
                                    J Offline
                                    Jim_Pen
                                    wrote on last edited by
                                    #18

                                    Is it possible that your Ethernet hardware does not have a controller? Should not PHY only adapters use CPU to move data?

                                    1 Reply Last reply
                                    0
                                    Reply
                                    • Reply as topic
                                    Log in to reply
                                    • Oldest to Newest
                                    • Newest to Oldest
                                    • Most Votes


                                    • Login

                                    • Don't have an account? Register

                                    • Login or register to search.
                                    • First post
                                      Last post
                                    0
                                    • Categories
                                    • Recent
                                    • Tags
                                    • Popular
                                    • World
                                    • Users
                                    • Groups