Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. JavaScript
  4. Cannot get dynamic javascript content from web page

Cannot get dynamic javascript content from web page

Scheduled Pinned Locked Moved JavaScript
helpjavascriptpythonphphtml
2 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E Offline
    E Offline
    elelont2
    wrote on last edited by
    #1

    Hi, i am trying to use pyQT and python to get the dynamic content from a web page. The problem is that i still only get the static content. What could be wrong with the code below? Code is based on this link: https://impythonist.wordpress.com/2015/01/06/ultimate-guide-for-scraping-javascript-rendered-web-pages/[^]

    import sys
    import time
    from PyQt4.QtGui import *
    from PyQt4.QtCore import *
    from PyQt4.QtWebKit import *
    from lxml.html import fromstring, tostring, iterlinks

    class Render(QWebPage):
    def __init__(self, url):
    self.app = QApplication(sys.argv)
    QWebPage.__init__(self)
    self.loadFinished.connect(self._loadFinished)
    self.mainFrame().load(QUrl(url))
    self.app.exec_()
    print("inside 1")

    def _loadFinished(self, result):
    self.frame = self.mainFrame()
    self.app.quit()
    print("inside 2")

    #def userAgentForUrl(self, url):

    return 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36 OPR/32.0.1948.25'

    url = 'http://www.somepage.com'

    r = Render(url)
    print("inside 3")

    print("Sleeping..")
    time.sleep(5)
    print("Sleeping done")

    result = r.mainFrame().toHtml()
    print(result.encode('utf-8'))

    I added the sleep(5) to ensure that the dynamic content has time to load but is does not help. Why doesn't the r.mainFrame() contain the valid dynamically created page contents? Is it not updated after the pageloaded event? Regards

    L 1 Reply Last reply
    0
    • E elelont2

      Hi, i am trying to use pyQT and python to get the dynamic content from a web page. The problem is that i still only get the static content. What could be wrong with the code below? Code is based on this link: https://impythonist.wordpress.com/2015/01/06/ultimate-guide-for-scraping-javascript-rendered-web-pages/[^]

      import sys
      import time
      from PyQt4.QtGui import *
      from PyQt4.QtCore import *
      from PyQt4.QtWebKit import *
      from lxml.html import fromstring, tostring, iterlinks

      class Render(QWebPage):
      def __init__(self, url):
      self.app = QApplication(sys.argv)
      QWebPage.__init__(self)
      self.loadFinished.connect(self._loadFinished)
      self.mainFrame().load(QUrl(url))
      self.app.exec_()
      print("inside 1")

      def _loadFinished(self, result):
      self.frame = self.mainFrame()
      self.app.quit()
      print("inside 2")

      #def userAgentForUrl(self, url):

      return 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36 OPR/32.0.1948.25'

      url = 'http://www.somepage.com'

      r = Render(url)
      print("inside 3")

      print("Sleeping..")
      time.sleep(5)
      print("Sleeping done")

      result = r.mainFrame().toHtml()
      print(result.encode('utf-8'))

      I added the sleep(5) to ensure that the dynamic content has time to load but is does not help. Why doesn't the r.mainFrame() contain the valid dynamically created page contents? Is it not updated after the pageloaded event? Regards

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      Wrong forum, please try in http://www.codeproject.com/script/Answers/List.aspx?tab=active[^].

      1 Reply Last reply
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups