Data replication
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
I'd have each client running a BITS (Background Intelligent Transfer ) app, pulling the data from the server, each client will running at different speeds, it works well in the wild (Windows updates). I haven't tried it yet but I may need to go down the same road in the future. Alternatively put a second disk on each machine with the data on and ghost/clone each disk, time taken to change each disk - around five minutes.
Brainware Error - reboot required.
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
We use a sync tool that spawns a copy of itself on the target machine after a successfull copy with instructions to begin transfering to a node awaiting update. The initial process keeps a list of who has been updated and manages the number of concurent copies that are running. So you start with 1 copy at first but it quickly jumps to 2, 4, 8... concurent copies. In house, non-Windows app so no link, but I imagine this wouldn't be too dificult to replicate on a Windows lan.
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
You could use BITS, but it's using unicast so the network would still be bogged down. And the problem with multicast is that it would normally use udp, so a dropped package would render a bad copy. There is a project on sourceforge[^] that seems abandoned. But it seems to have solved that problem using multicasting udp stream for data and tcp connections for synchronization & control data. You could also check if there is any useful bittorrent program for the purpose.
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
I'm thinking bittorrent as well, however if you have the budget could you not use some sort of removable hard drive rack and copy it to them directly from the source then plug them into the other machines. Seems like a network is just too limiting for what you need. Hell, even opening the boxes and plugging in the hard drives is faster than copying.
When everyone is a hero no one is a hero.
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
Why not use something like TrueImage server to create an image and then replicate that image to your target machines.
-
I'm thinking bittorrent as well, however if you have the budget could you not use some sort of removable hard drive rack and copy it to them directly from the source then plug them into the other machines. Seems like a network is just too limiting for what you need. Hell, even opening the boxes and plugging in the hard drives is faster than copying.
When everyone is a hero no one is a hero.
John C wrote:
I'm thinking bittorrent as well,
It's a possibility, I forgot that torrents can do multiple files - although the hundreds of thousands of files is going to most likely result in a very large .torrent - perhaps even be too big [although maybe it could still be handled with a pt1 and pt2 type strategy]. Difficulty will be getting our IT department [who's job it is to obstruct us at every turn] to allow running of the torrent s/w for trial purposes -- although I'm sure I could isolate things.
John C wrote:
however if you have the budget could you not use some sort of removable hard drive rack and copy it to them directly from the source then plug them into the other machines. Seems like a network is just too limiting for what you need. Hell, even opening the boxes and plugging in the hard drives is faster than copying.
It's an option still under consideration - however, even with a disk to disk copy using SATA, it's fairly likely to still take hours for each transfer -- just "preparing to copy" with explorer took nearly an hour, hence me telling people about robocopy. Even if it dropped to 1hr per update (and assuming one at a time), it's going to be a couple of long days for the engineer.
Regards, Ray
-
Why not use something like TrueImage server to create an image and then replicate that image to your target machines.
Chris Austin wrote:
Why not use something like TrueImage server to create an image and then replicate that image to your target machines.
I was thinking "Ghost" at one point, but am not sure whether that will still kill the "source" machine whilst it tries to supply 20 machines.
Regards, Ray
-
I'd have each client running a BITS (Background Intelligent Transfer ) app, pulling the data from the server, each client will running at different speeds, it works well in the wild (Windows updates). I haven't tried it yet but I may need to go down the same road in the future. Alternatively put a second disk on each machine with the data on and ghost/clone each disk, time taken to change each disk - around five minutes.
Brainware Error - reboot required.
norm .net wrote:
I'd have each client running a BITS (Background Intelligent Transfer ) app
My personal experience with BITS (Windows Update / MSDN) is that it's too slow - I have a 20Mbit connection at home and it rarely stresses it. In the situation I've described the machines share a 1Gbit LAN - actually each machine is connected to at least 3x 1Gbit LANs for traffic separation purposes and ALL of this could be used during the updates [at the system will be down WRT the users].
Regards, Ray
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
-
John C wrote:
I'm thinking bittorrent as well,
It's a possibility, I forgot that torrents can do multiple files - although the hundreds of thousands of files is going to most likely result in a very large .torrent - perhaps even be too big [although maybe it could still be handled with a pt1 and pt2 type strategy]. Difficulty will be getting our IT department [who's job it is to obstruct us at every turn] to allow running of the torrent s/w for trial purposes -- although I'm sure I could isolate things.
John C wrote:
however if you have the budget could you not use some sort of removable hard drive rack and copy it to them directly from the source then plug them into the other machines. Seems like a network is just too limiting for what you need. Hell, even opening the boxes and plugging in the hard drives is faster than copying.
It's an option still under consideration - however, even with a disk to disk copy using SATA, it's fairly likely to still take hours for each transfer -- just "preparing to copy" with explorer took nearly an hour, hence me telling people about robocopy. Even if it dropped to 1hr per update (and assuming one at a time), it's going to be a couple of long days for the engineer.
Regards, Ray
Ray Hayes wrote:
just "preparing to copy" with explorer took nearly an hour,
I think you're better off doing it from the command prompt, I believe it's slightly faster. Also, can't you do the copy to the harddrive on the server end, then just plug the HD into the clients?
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
The tools I think you need is included in the windows server 2003 or XP resource kit. From the documentation: "MQcast.exe and MQcatch.exe have the following major features: • Files are copied from a single server to multiple clients, or listeners, using reliable IP multicasting through the Pragmatic General Multicast (PGM) driver, which is installed and configured as part of the Core subcomponent during every installation of Message Queuing 3.0. • The transmission bandwidth does not depend on the number of listeners. • Files are copied using Message Queuing (MSMQ), so the programs continue to copy files even if there are network problems and connectivity interruptions. • The file data are transmitted in express messages, which are stored only in RAM, but listeners use meta data transmitted along a second channel to ensure that they receive all the file data transmitted. • Files and folder trees can be copied. • Files can be copied from/to computers operating in workgroup mode (without access to Active Directory) or in domain mode (with access to Directory Service). The following requirements must be satisfied before running MQcast.exe and MQcatch.exe: • Windows XP Professional must be installed with Message Queuing 3.0 on the transmitter and the listeners. • The non-MSMQ routers operating in the network must be multicast-enabled."
-
Ray Hayes wrote:
just "preparing to copy" with explorer took nearly an hour,
I think you're better off doing it from the command prompt, I believe it's slightly faster. Also, can't you do the copy to the harddrive on the server end, then just plug the HD into the clients?
A Wong wrote:
I think you're better off doing it from the command prompt, I believe it's slightly faster. Also, can't you do the copy to the harddrive on the server end, then just plug the HD into the clients?
I have been, am and always will be a command line person - I caught the copy being performed by a team member using explorer. When I asked how long it had been "preparing" the answer was about 35minutes. After a few more minutes I asked them to abort and use the command line.
Regards, Ray
-
The tools I think you need is included in the windows server 2003 or XP resource kit. From the documentation: "MQcast.exe and MQcatch.exe have the following major features: • Files are copied from a single server to multiple clients, or listeners, using reliable IP multicasting through the Pragmatic General Multicast (PGM) driver, which is installed and configured as part of the Core subcomponent during every installation of Message Queuing 3.0. • The transmission bandwidth does not depend on the number of listeners. • Files are copied using Message Queuing (MSMQ), so the programs continue to copy files even if there are network problems and connectivity interruptions. • The file data are transmitted in express messages, which are stored only in RAM, but listeners use meta data transmitted along a second channel to ensure that they receive all the file data transmitted. • Files and folder trees can be copied. • Files can be copied from/to computers operating in workgroup mode (without access to Active Directory) or in domain mode (with access to Directory Service). The following requirements must be satisfied before running MQcast.exe and MQcatch.exe: • Windows XP Professional must be installed with Message Queuing 3.0 on the transmitter and the listeners. • The non-MSMQ routers operating in the network must be multicast-enabled."
-
Chris Austin wrote:
Why not use something like TrueImage server to create an image and then replicate that image to your target machines.
I was thinking "Ghost" at one point, but am not sure whether that will still kill the "source" machine whilst it tries to supply 20 machines.
Regards, Ray
I don't know about Ghost but when I last used TrueImage Server it was pretty speedy and didn't hammer the performance of the source machine during the incremental backups/updates. Of course during the initial backup it ground the machine to a halt. Arcronis (sp?) used to have a free and fully functional trial of their server products so you can always give it a shot and see how it handles and if it meets your needs.
-
A Wong wrote:
I think you're better off doing it from the command prompt, I believe it's slightly faster. Also, can't you do the copy to the harddrive on the server end, then just plug the HD into the clients?
I have been, am and always will be a command line person - I caught the copy being performed by a team member using explorer. When I asked how long it had been "preparing" the answer was about 35minutes. After a few more minutes I asked them to abort and use the command line.
Regards, Ray
The "preparing" is because explorer is copying via Windows disk cache. Seemingly without multithreading. :sigh: I'm not using explorer!
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
bit-torrent?
"mostly watching the human race is like watching dogs watch tv ... they see the pictures move but the meaning escapes them"
-
If you solve the problem, would you please send me a message? I would appreciate that a lot, because I might end up in a similar situation. Regards /Jörgen
-
[Not a programming question...] I've tried Googling and it's likely I'm not using the correct terminology, so I'll ask here..... My company will in the future need to be able to update a number of machines (about 20) with mapping data. This data (hundreds of thousands of small files) is likely to be approaching 1TB and needs to be copied to each machine. These updates are likely to need to happen every few months in the first year but will settle down once all the bugs are fixed. It is fairly likely the incremental updates will not help - due to the nature of the data content. Yesterday in a very simple update with a reduced data-size, this took 23hours to copy using Robocopy [across a verified 1Gbit LAN]. Kicking off 20 parallel Robocopies is likely to kill the performance of the "source" machine. Running the tasks in sequence is going to take far too long and have too much manual input. Does anyone know of a better way? Wearing my programming hat, I'd have a "transmitter" on the source machine multi-casting [even broadcasting, since it's a dedicated LAN and the machines will be off-line during this update] to each destination - which using it's client s/w would write the transmitted data. This should, in theory, allow all 20 machines to be updated in the same time that one machine would be. Before I resort to having development of such a tool added to our development schedule - does something like this already exist? Ideally free or lowish cost [e.g. < $2000]. Is there an easier solution?
Regards, Ray
If these machines are physically located near each other, I'd be tempted to use a removable hard drive. Copy the server data to the first one, and then use it as a master on a separate machine to duplicate to the others. That would minimize your load on the source material.
Software Zen:
delete this;