Did you know...
-
It's probably to encourage smaller source files: 10MB of code in one file is probably a little too big ... :laugh: Why on earth do you want to load a 1GB XML anyway? That's far too big for me to want to read!
Sent from my Amstrad PC 1640 Never throw anything away, Griff Bad command or file name. Bad, bad command! Sit! Stay! Staaaay... AntiTwitter: @DalekDave is now a follower!
I just wanted to see what was in it. :) They range from 1mb, up to about 3.5gb. I have no control over how large the files to be processed are (they're generated by nessus security scans). The idiots that generate the files are completely unwilling to accommodate us, so it's essentially a "it is what is is" situation. I have to parse these files and store the results in our database. Using just
XDocument
, I was running out of memory (the server in question only has 8gb, of which most is already used by other processes), so I have to resort to using a combination ofXmlReader
andLinqToXml
. Notepad, IE, Firefox, WordPad, and MS Word all load the file, but it takes more than five MINUTES for them, and wordpad/word become completely unusable. <rant> I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer. I have more years in the industry than most people on CP have even been alive. </rant>".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013 -
That's a problem with XML; it has to be read in its entirety before you can do anything with it. At work I receive a 6GB XML file every stinking day and I have to use SSIS to get it into a database. I'm beginning to prefer JSON, which I can read one object at a time (provided the outer-most value is a array of objects). However, I have written a fairly simple XML file splitter so I can make smaller files from one big one when I need to find out where a problem (e.g. non-well-formed XML) exists.
That's not entirely true. You can use
XmlReader
, and it sequentially reads a node at a time (it's slower thanXDocument
, and you can't go reverse read direction, but it solves my issue).".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013 -
I just wanted to see what was in it. :) They range from 1mb, up to about 3.5gb. I have no control over how large the files to be processed are (they're generated by nessus security scans). The idiots that generate the files are completely unwilling to accommodate us, so it's essentially a "it is what is is" situation. I have to parse these files and store the results in our database. Using just
XDocument
, I was running out of memory (the server in question only has 8gb, of which most is already used by other processes), so I have to resort to using a combination ofXmlReader
andLinqToXml
. Notepad, IE, Firefox, WordPad, and MS Word all load the file, but it takes more than five MINUTES for them, and wordpad/word become completely unusable. <rant> I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer. I have more years in the industry than most people on CP have even been alive. </rant>".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013John Simmons / outlaw programmer wrote:
<rant> I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer. I have more years in the industry than most people on CP have even been alive. </rant>
It's been a while I've seen you assert yourself on CP. I kinda miss the ol' smackdowns.
-
I just wanted to see what was in it. :) They range from 1mb, up to about 3.5gb. I have no control over how large the files to be processed are (they're generated by nessus security scans). The idiots that generate the files are completely unwilling to accommodate us, so it's essentially a "it is what is is" situation. I have to parse these files and store the results in our database. Using just
XDocument
, I was running out of memory (the server in question only has 8gb, of which most is already used by other processes), so I have to resort to using a combination ofXmlReader
andLinqToXml
. Notepad, IE, Firefox, WordPad, and MS Word all load the file, but it takes more than five MINUTES for them, and wordpad/word become completely unusable. <rant> I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer. I have more years in the industry than most people on CP have even been alive. </rant>".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013Ouch! That's a stupid amount of data, particularly for a text-based transfer mechanism. Have these people never heard of databases? On the bright side, at least it's not XLSX? :laugh:
Sent from my Amstrad PC 1640 Never throw anything away, Griff Bad command or file name. Bad, bad command! Sit! Stay! Staaaay... AntiTwitter: @DalekDave is now a follower!
-
Ouch! That's a stupid amount of data, particularly for a text-based transfer mechanism. Have these people never heard of databases? On the bright side, at least it's not XLSX? :laugh:
Sent from my Amstrad PC 1640 Never throw anything away, Griff Bad command or file name. Bad, bad command! Sit! Stay! Staaaay... AntiTwitter: @DalekDave is now a follower!
OriginalGriff wrote:
On the bright side, at least it's not XLSX?
dunno, xlsx isn't so bad and easier (ok lazier) to debug if there's bad data elements jus load into excel and scroll down to the line with the issue. if you're suggesting interop (i.e. slower than molasses) that's a complete other issue, and there are way way faster [read & write] alternatives. worst comes to worst can unpack the xlsx and viola, it's xml (pretty much exactly the same). (not criticizing, just unsure why you think it's any worse.)
Message Signature (Click to edit ->)
-
I just wanted to see what was in it. :) They range from 1mb, up to about 3.5gb. I have no control over how large the files to be processed are (they're generated by nessus security scans). The idiots that generate the files are completely unwilling to accommodate us, so it's essentially a "it is what is is" situation. I have to parse these files and store the results in our database. Using just
XDocument
, I was running out of memory (the server in question only has 8gb, of which most is already used by other processes), so I have to resort to using a combination ofXmlReader
andLinqToXml
. Notepad, IE, Firefox, WordPad, and MS Word all load the file, but it takes more than five MINUTES for them, and wordpad/word become completely unusable. <rant> I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer. I have more years in the industry than most people on CP have even been alive. </rant>".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013If you just want to take a peek in the file you can use the Lister that comes with Total Commander. It's still immediate on a 20GB file. No specific support for XML though, it's treated the same as any file.
Wrong is evil and must be defeated. - Jeff Ello
-
VS(2017) has a maximum supported file size of 10mb? I just found out myself while trying to load a 925mb xml file.
".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013Methinks you need to rethink your XML design
CQ de W5ALT
Walt Fair, Jr., P. E. Comport Computing Specializing in Technical Engineering Software
-
VS(2017) has a maximum supported file size of 10mb? I just found out myself while trying to load a 925mb xml file.
".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013 -
OriginalGriff wrote:
On the bright side, at least it's not XLSX?
dunno, xlsx isn't so bad and easier (ok lazier) to debug if there's bad data elements jus load into excel and scroll down to the line with the issue. if you're suggesting interop (i.e. slower than molasses) that's a complete other issue, and there are way way faster [read & write] alternatives. worst comes to worst can unpack the xlsx and viola, it's xml (pretty much exactly the same). (not criticizing, just unsure why you think it's any worse.)
Message Signature (Click to edit ->)
Have you ever tried to load 1GB into Excel? :omg: (And bear in mind that XLSX is packaged, zipped, XML - and thus slower and more memory hungry than "naked" XML)
Sent from my Amstrad PC 1640 Never throw anything away, Griff Bad command or file name. Bad, bad command! Sit! Stay! Staaaay... AntiTwitter: @DalekDave is now a follower!
-
I just wanted to see what was in it. :) They range from 1mb, up to about 3.5gb. I have no control over how large the files to be processed are (they're generated by nessus security scans). The idiots that generate the files are completely unwilling to accommodate us, so it's essentially a "it is what is is" situation. I have to parse these files and store the results in our database. Using just
XDocument
, I was running out of memory (the server in question only has 8gb, of which most is already used by other processes), so I have to resort to using a combination ofXmlReader
andLinqToXml
. Notepad, IE, Firefox, WordPad, and MS Word all load the file, but it takes more than five MINUTES for them, and wordpad/word become completely unusable. <rant> I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer. I have more years in the industry than most people on CP have even been alive. </rant>".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013I'd recommend UltraEdit. You can disable the "make automatic backups when opening files" and then you are able to open and work with very large files. Fast. That feature, and built-in hex edit that allow me to see everything, including BOM bytes in files makes it worth the license fee. Just if you didn't know it - and needed something better than notepad and notepad++ for large files :)
Do you know why it's important to make fast decisions? Because you give yourself more time to correct your mistakes, when you find out that you made the wrong one. Chris Meech on deciding whether to go to his daughters graduation or a Neil Young concert
-
I just wanted to see what was in it. :) They range from 1mb, up to about 3.5gb. I have no control over how large the files to be processed are (they're generated by nessus security scans). The idiots that generate the files are completely unwilling to accommodate us, so it's essentially a "it is what is is" situation. I have to parse these files and store the results in our database. Using just
XDocument
, I was running out of memory (the server in question only has 8gb, of which most is already used by other processes), so I have to resort to using a combination ofXmlReader
andLinqToXml
. Notepad, IE, Firefox, WordPad, and MS Word all load the file, but it takes more than five MINUTES for them, and wordpad/word become completely unusable. <rant> I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer. I have more years in the industry than most people on CP have even been alive. </rant>".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013John Simmons / outlaw programmer wrote:
I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer.
Just like at work, other people mess up, but you get the blame!
John Simmons / outlaw programmer wrote:
I have more years in the industry than most people on CP have even been alive.
That's no guarantee for actually being a good programmer. For example, the programmer who gives you 3.5 GB of XML in a single file probably says the same :rolleyes:
Best, Sander sanderrossel.com Continuous Integration, Delivery, and Deployment arrgh.js - Bringing LINQ to JavaScript Object-Oriented Programming in C# Succinctly
-
That's a problem with XML; it has to be read in its entirety before you can do anything with it. At work I receive a 6GB XML file every stinking day and I have to use SSIS to get it into a database. I'm beginning to prefer JSON, which I can read one object at a time (provided the outer-most value is a array of objects). However, I have written a fairly simple XML file splitter so I can make smaller files from one big one when I need to find out where a problem (e.g. non-well-formed XML) exists.
Combining XmlReader and LinqToXML, the memory consumption never goes above 350mb, and it takes about 45 minutes to run though the sample files (this includes adding the data to the database, one record at a time (426,000 records). When I add a dash of TPL, it only takes about 9 minutes to process the same three files. I think I could get it even faster if I inserted multiple records per query, but I'm tired of dickin' with it.
".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013 -
Ouch! That's a stupid amount of data, particularly for a text-based transfer mechanism. Have these people never heard of databases? On the bright side, at least it's not XLSX? :laugh:
Sent from my Amstrad PC 1640 Never throw anything away, Griff Bad command or file name. Bad, bad command! Sit! Stay! Staaaay... AntiTwitter: @DalekDave is now a follower!
OriginalGriff wrote:
Have these people never heard of databases?
That's our job. :) Got memory consumption down to no more than 350mb and it only takes 9 minutes to process my three sample files, for a total of 426,000 records. I'm going to look awesome on Tuesday. Upside, this app replaces a large perl script that was doing the same job, and everyone in the shop can maintain it because - well - it's not perl. :)
".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013 -
John Simmons / outlaw programmer wrote:
I wish people here (not you but some others) would stop f*ckin assuming I'm a rookie programmer.
Just like at work, other people mess up, but you get the blame!
John Simmons / outlaw programmer wrote:
I have more years in the industry than most people on CP have even been alive.
That's no guarantee for actually being a good programmer. For example, the programmer who gives you 3.5 GB of XML in a single file probably says the same :rolleyes:
Best, Sander sanderrossel.com Continuous Integration, Delivery, and Deployment arrgh.js - Bringing LINQ to JavaScript Object-Oriented Programming in C# Succinctly
Sander Rossel wrote:
For example, the programmer who gives you 3.5 GB of XML in a single file probably says the same
We don't get the files from programmers - we get them from security nazis.
".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013 -
Methinks you need to rethink your XML design
CQ de W5ALT
Walt Fair, Jr., P. E. Comport Computing Specializing in Technical Engineering Software
It ain't my design, and it won't be changing to anything better.
".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013 -
Sander Rossel wrote:
For example, the programmer who gives you 3.5 GB of XML in a single file probably says the same
We don't get the files from programmers - we get them from security nazis.
".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013John Simmons / outlaw programmer wrote:
we get them from security nazis.
And on the other side of that barrier there is some poor sod producing the xml. Or it was designed in the 90s and they refuse to even consider changing something that works - sort of.
Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP
-
John Simmons / outlaw programmer wrote:
we get them from security nazis.
And on the other side of that barrier there is some poor sod producing the xml. Or it was designed in the 90s and they refuse to even consider changing something that works - sort of.
Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP
a scan tool called Nessus generates the file. I know nothing about it, or it’s configurability where file generation is concerned.
".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013 -
John Simmons / outlaw programmer wrote:
we get them from security nazis.
And on the other side of that barrier there is some poor sod producing the xml. Or it was designed in the 90s and they refuse to even consider changing something that works - sort of.
Never underestimate the power of human stupidity - RAH I'm old. I know stuff - JSOP
More likely it was designed in the 90's when the log data was small and (the then new and cutting edge) XML made some sense. But ... the developer who wrote that moved on, and file formats are boring, so the new guy just tested it worked in small scale and worked on the sexier stuff. And now ... intrusion / vulnerability data has grown like everything else and it's just a silly decision with hindsight.
Sent from my Amstrad PC 1640 Never throw anything away, Griff Bad command or file name. Bad, bad command! Sit! Stay! Staaaay... AntiTwitter: @DalekDave is now a follower!
-
The 6GB XML file I have to read is the backup of a third-party system. It's not only XML, but it's all name/value pairs.
-
Combining XmlReader and LinqToXML, the memory consumption never goes above 350mb, and it takes about 45 minutes to run though the sample files (this includes adding the data to the database, one record at a time (426,000 records). When I add a dash of TPL, it only takes about 9 minutes to process the same three files. I think I could get it even faster if I inserted multiple records per query, but I'm tired of dickin' with it.
".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013Well, for the most part I'm limited to built-in SSIS components. Potentially I could write something custom, as I have for JSON and CSV files (ones which aren't stable enough for the flat-file components).