Printing web pages - why is this so hard?
-
The root of the issue is that Microsoft has a tendency to delete web pages documenting their sins. I live in the embedded world where things never die. So, in my support documentation, I cannot reference web links - I embed captured PDF files. What I have discovered is that printing a web page inevitably leads to text truncation, something I cannot tolerate. For example, if I use FireFox or Chrome to "print" to pdf - chunks of text are missing. It's hit or miss. Using IE (I mean MS made it right?), I still get the same result. I saved the entire page - IE won't load it. Seems very random. Suggestions?
Charlie Gilley <italic>Stuck in a dysfunctional matrix from which I must escape... "Where liberty dwells, there is my country." B. Franklin, 1783 “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
I used to get this often in firefox, until I found out a best kept secret. Firefox has 2 built in print systems. If you press ctrl+p like everyone does you get the standard, rubbish text truncation print system. If however, you click on the hamburger menu top right, and find the print option, you get the second print provider that allows you access to all sorts of formatting options, and which does a better job too, esp when printing to a virtual off printer.
-
You could use FS Capture or some other screen capture software. I sometimes use FS Capture because it have a command for capture the whole scrolling page.
/Mikael
I was going to be a smartass and point out that PrintScreen has never introduced any formatting error for me, but obviously that's only good for a page that fits entirely on a single screen. This looks like a smarter solution if it can do the whole page even when it requires scrolling. OTOH, you still end up with an image and lose all context. Chrome's built-in dev tools can do some decent things that do keep the DOM elements - they really should leverage that to help customize printing. Of course it'd have to be called expert mode printing or some-such...
-
I was going to be a smartass and point out that PrintScreen has never introduced any formatting error for me, but obviously that's only good for a page that fits entirely on a single screen. This looks like a smarter solution if it can do the whole page even when it requires scrolling. OTOH, you still end up with an image and lose all context. Chrome's built-in dev tools can do some decent things that do keep the DOM elements - they really should leverage that to help customize printing. Of course it'd have to be called expert mode printing or some-such...
dandy72 wrote:
but obviously that's only good for a page that fits entirely on a single screen
As if you couldn't open a word file, do the margins to the minimum, paste the picture and go for the next screenshot... repeat until web page is ready to get printed as a whole
M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.
-
The root of the issue is that Microsoft has a tendency to delete web pages documenting their sins. I live in the embedded world where things never die. So, in my support documentation, I cannot reference web links - I embed captured PDF files. What I have discovered is that printing a web page inevitably leads to text truncation, something I cannot tolerate. For example, if I use FireFox or Chrome to "print" to pdf - chunks of text are missing. It's hit or miss. Using IE (I mean MS made it right?), I still get the same result. I saved the entire page - IE won't load it. Seems very random. Suggestions?
Charlie Gilley <italic>Stuck in a dysfunctional matrix from which I must escape... "Where liberty dwells, there is my country." B. Franklin, 1783 “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
I have been fighting to get decent printouts for years - and given up. Today, I rather save the complete web page, as an html file, with an associated subdirectory for images and style sheets and whatever. Most browsers provide this as a menu or control char command and takes care of everything. A small disadvantage is, if you save hundreds of these pages, is that you end up with hundreds of copies of the same icons, images, common script snippets etc, one per saved page. But disk is cheap nowadays; it is no really big issue. My experience is that this works a lot better than making PDF files for printing. 10-15 years ago, there was a whole crowd of "web harvesters" that allowed you to download an entire web site. They would keep the URL structure as a directory, so that e.g. icons and images were stored only once, for a much cleaner structure, if you want offline access to an entire website, or a major part. Fifteen years ago, there were still a few webpages here and there with more or less static, plain text/graphics info, so it used to work quite well. Nowadays, when 99% of the web pages are built on-the-spot for each request, and much of the information presented is retrieved from a remote database as you move around in the page, the harvesters (crawlers, scrapers, ... lots of names are in use) are not as useful as they used to be. Googling for e.g. "web harvesting" gives you enough links to keep you busy until the pandemic is over :-)
-
The root of the issue is that Microsoft has a tendency to delete web pages documenting their sins. I live in the embedded world where things never die. So, in my support documentation, I cannot reference web links - I embed captured PDF files. What I have discovered is that printing a web page inevitably leads to text truncation, something I cannot tolerate. For example, if I use FireFox or Chrome to "print" to pdf - chunks of text are missing. It's hit or miss. Using IE (I mean MS made it right?), I still get the same result. I saved the entire page - IE won't load it. Seems very random. Suggestions?
Charlie Gilley <italic>Stuck in a dysfunctional matrix from which I must escape... "Where liberty dwells, there is my country." B. Franklin, 1783 “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
Thanks for raising this issue. I have the same concern. You asked why this is so hard, so I will attempt to answer this question first: It is so hard because most web system designers do not consider this feature important. In their mind, it is not a requirement, so they do not implement this feature or test it. There may also be some web system designers that do not want you to be able to capture the web page, and actually go out of their way to make this difficult or impossible. You also ask for suggestions. I believe the reason that printing does not work is that Cascading Style Sheets (CSS) allow the web system designer to change the layout of the web page for different devices (so that the presentation on a small smart phone can have a completely different appearance from that of a large monitor). What is needed is the ability to OVERRIDE the default or provided style sheet for printing to instead use a style sheet suitable for the selected printer. Perhaps this could be implemented in a browser plug in. This is certainly something I would be interested in.
-
The root of the issue is that Microsoft has a tendency to delete web pages documenting their sins. I live in the embedded world where things never die. So, in my support documentation, I cannot reference web links - I embed captured PDF files. What I have discovered is that printing a web page inevitably leads to text truncation, something I cannot tolerate. For example, if I use FireFox or Chrome to "print" to pdf - chunks of text are missing. It's hit or miss. Using IE (I mean MS made it right?), I still get the same result. I saved the entire page - IE won't load it. Seems very random. Suggestions?
Charlie Gilley <italic>Stuck in a dysfunctional matrix from which I must escape... "Where liberty dwells, there is my country." B. Franklin, 1783 “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
I find the Firefox extension Print Friendly[^] to be indispensible.
-
The root of the issue is that Microsoft has a tendency to delete web pages documenting their sins. I live in the embedded world where things never die. So, in my support documentation, I cannot reference web links - I embed captured PDF files. What I have discovered is that printing a web page inevitably leads to text truncation, something I cannot tolerate. For example, if I use FireFox or Chrome to "print" to pdf - chunks of text are missing. It's hit or miss. Using IE (I mean MS made it right?), I still get the same result. I saved the entire page - IE won't load it. Seems very random. Suggestions?
Charlie Gilley <italic>Stuck in a dysfunctional matrix from which I must escape... "Where liberty dwells, there is my country." B. Franklin, 1783 “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
I very often print to PDF for documentation. What I often resort to in cases like this is selecting the important text (i.e. the actual article, the actual important information to save, but not the left or right sidebars etc.) and then choose "Only selection" when printing (using Ctrl+P in Chrome, choosing the printer "Save As PDF"). For some reason, when I do that, it's often much better formatted. Furthermore, the PDF will be smaller, as it will only contain the important part and not irrelevant text and images relating to other articles etc.
-
The root of the issue is that Microsoft has a tendency to delete web pages documenting their sins. I live in the embedded world where things never die. So, in my support documentation, I cannot reference web links - I embed captured PDF files. What I have discovered is that printing a web page inevitably leads to text truncation, something I cannot tolerate. For example, if I use FireFox or Chrome to "print" to pdf - chunks of text are missing. It's hit or miss. Using IE (I mean MS made it right?), I still get the same result. I saved the entire page - IE won't load it. Seems very random. Suggestions?
Charlie Gilley <italic>Stuck in a dysfunctional matrix from which I must escape... "Where liberty dwells, there is my country." B. Franklin, 1783 “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
I have found the Chrome extension from PrintWhatYouLike.com works well, allows you to remove all the junk you don't want and then save to PDF. Does a decent job in Chrome. Unfortunatley it does not work well in FireFox, which only supports a bookmarklet. They also have an extension/bookmarklet that can combine pages where you need to keep clicking "next" to see more content, into one single page. But I have not used that one.
-
The root of the issue is that Microsoft has a tendency to delete web pages documenting their sins. I live in the embedded world where things never die. So, in my support documentation, I cannot reference web links - I embed captured PDF files. What I have discovered is that printing a web page inevitably leads to text truncation, something I cannot tolerate. For example, if I use FireFox or Chrome to "print" to pdf - chunks of text are missing. It's hit or miss. Using IE (I mean MS made it right?), I still get the same result. I saved the entire page - IE won't load it. Seems very random. Suggestions?
Charlie Gilley <italic>Stuck in a dysfunctional matrix from which I must escape... "Where liberty dwells, there is my country." B. Franklin, 1783 “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759
-
The root of the issue is that Microsoft has a tendency to delete web pages documenting their sins. I live in the embedded world where things never die. So, in my support documentation, I cannot reference web links - I embed captured PDF files. What I have discovered is that printing a web page inevitably leads to text truncation, something I cannot tolerate. For example, if I use FireFox or Chrome to "print" to pdf - chunks of text are missing. It's hit or miss. Using IE (I mean MS made it right?), I still get the same result. I saved the entire page - IE won't load it. Seems very random. Suggestions?
Charlie Gilley <italic>Stuck in a dysfunctional matrix from which I must escape... "Where liberty dwells, there is my country." B. Franklin, 1783 “They who can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.” BF, 1759