注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

淘尽网 官方博客

淘尽网http://www.tao3w.com做最好的比价网站

 
 
 

日志

 
 
关于我

淘尽网 http://www.tao3w.com 做最好的比价网站,做最好的数据抓取专家。

网易考拉推荐

Create screenshots of a web page using Python and QtWebKit  

2012-08-24 09:25:35|  分类: python |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

3. Dezember 2008

Update 2009-10-03:
For further development and improvements, contact me or have a look at thispublic github repository created by Adam Nelson.

Update 2010-04-12:
If you need flash support, you should have a look at the current github version of this script at http://github.com/AdamN/python-webkit2png/ mentioned above. We’ve extend the script a few month ago.


From time to time you may want to create a screenshot of a web page from command line, for example if you wish to create thumbnails for your web-application. So you might search for such a program and find tools likewebkit2png, which is for Mac OS X only, or khtml2png, which requires a lot of KDE stuff to be installed on your server.

But since Qt Software, formerly known als Trolltech, integrated Safari’s famous rendering engine WebKit (which is based on Konqueror’s khtml engine) into its framework, we are now able to make use of it with the help of some Python andPyQt4.

blog_small1.png

If you are in a hurry, click here to get a full-featured version of webkit2png.py.

I assume that you have some basic knowledge of python. If you run into problems with the Qt part of this tutorial, I suggest to have a look at the class documentation, first. Please note that Qt is a C++ framework, and most of the example code in this documentation has not been ported. So it might be helpful if you have some basic knowledge of C++, too.

Requirements: Webkit and PyQt4 (packages libqt4-webkit and python-qt4 when you’re using Intrepid Ibex).

So, run your favourite editor (vim, of course) and start to enter some python code. First, we will have to organize some imports:

#!/usr/bin/env python import sys # required to exit this program import signal # required to catch CTRL-C (I'll explain this later)  # Some of the PyQt libs from PyQt4.QtCore import * from PyQt4.QtGui import * from PyQt4.QtWebKit import *

Qt is highly event based (called “slots” and “signals”), so we have to prepare a “slot” which gets called when the page has been loaded completely:

def onLoadFinished(result):     print "loadFinished(%s)" % str(result)     sys.exit(0) # this is the moment when we have to quit normally

Even if we intend to write a CLI based application, QtWebkit requires a GUI in the background. This is why we have to use QApplication instead of QCoreApplication. And because we will not have any visible controls, we should ensure that we can still quit this application using CTRL-C (this is why we have to import signal):

app = QApplication(sys.argv) signal.signal(signal.SIGINT, signal.SIG_DFL)

Now we can create a QWebPage-Object without any exception or segmentation fault. Connect it with our “onLoadFinished”-slot and load the url you want to make a screenshot of (here I’m using Google):

webpage = QWebPage() webpage.connect(webpage, SIGNAL("loadFinished(bool)"), onLoadFinished) webpage.mainFrame().load(QUrl("http://www.google.com"))

If you run this application now, you’ll see… nothing. onLoadFinished might be called, but the result will be “False”. This is because Qt is so extremly event-based, and there is still no main loop to handle these events. So finally you have to start your QApplication:

sys.exit(app.exec_())

If you execute this now, the output should be:

onLoadFinished(True)

Good, the page is loaded! The next step is to render this into a file by expanding “onLoadFinished” (this means: all the code from now on have to be INSIDE of “onLoadFinished”). At first, we should ensure that we do not proceed if we got an error:

def onLoadFinished(result):     print "loadFinished(%s)" % str(result)     if not result:         print "Request failed"         sys.exit(1)

Otherwise, we should enlarge the viewport (that is our virtual browser window) to the desired size. If you want to create a picture of the whole page, you should use the “preferred” size of the contents:

        print "Request failed"         sys.exit(1)      # Set the size of the (virtual) browser window     webpage.setViewportSize(webpage.mainFrame().contentsSize()

And finally, render this into an QImage-object and store this into a file:

    # Set the size of the (virtual) browser window     webpage.setViewportSize(webpage.mainFrame().contentsSize()      # Paint this frame into an image     image = QImage(webpage.viewportSize(), QImage.Format_ARGB32)     painter = QPainter(image)     webpage.mainFrame().render(painter)     painter.end()     image.save("output.png")     sys.exit(0) # quit this application

Done. Pretty easy, isn’t it? Oh, wait! QWebPage depends an QtGui, and QtGui depends on a running X server (at least on Unix systems). So how
can we make use of this on a headless server machine? The answer is Xvfb, a framebuffer based X server, originally designed for testing purposes. Of course, it requires some X-libs and fonts, too (how should a page be rendered without any fonts?), but it does not have so much overhead like the real XOrg-server and don’t need to be running all the time. Just call the script this way:

$ xvfb-run --server-args="-screen 0, 640x480x24" python webkit2png-simple.py

The screen size doesn’t matter, but the color depth of 24 bit is important. Otherwise, the resulting screenshot would be limited to 256 colors. For more options, have a look at the man-Pages of ‘Xvfb’ and ‘xvfb-run’.

Last, but not least, I’ll provide you two versions of this script. webkit2png-simple.py is exactly the result of this tutorial, while webkit2png.py is a much more improved version with command line arguments and coded in OOP style (seethe github repository for the most recent version).

Update 2009-04-01
Here’s another guy who had the same idea earlier than me.

Posted by Cybso Filed in 1 
Tags: englishHowToLinuxProgramming
79 Comments ?

79 Responses to “Create screenshots of a web page using Python and QtWebKit”

  1. Uniblogs · Uniblogs im Rückspiegel: Was wichtig war in KW 49 Says: 
    Dezember 9th, 2008 at 21:20

    [...] geschrieben hat. Die Themen wie immer sehr bunt: Mutma?ungen über die Uniblogs, Wahlhelfersuche, Screenshotautomatisierung mit Python und Qts Webkit (sehr praktisch!), die Uni-Wahlen 2009, nVidia und der Intrepid Ibex und de [...]

  2. Screenshot a URL with Python and Qt and WebKit ? the renaissance manSays: 
    Februar 13th, 2009 at 08:22

    [...] when I found the work of Roland Tapken. His script and explanation were the solution I needed. It made nice screenshots, had the [...]

  3. Alex Ezell Says: 
    Februar 18th, 2009 at 04:24

    Roland, I am having trouble using this script. All of the screenshots turn out fine, but it seems like the Xvfb servers are not killed or exited properly. So, as I create screenshots, Xvfb processes are left behind every time the script runs. Do you have any thoughts why this might be happening?

  4. Roland Says: 
    Februar 18th, 2009 at 11:01

    This sounds strange because the application exits itself immediatly after the image is written to disk. Might be a problem with xvfb-run.

    I’ve read in your blog that you modified the script. This should not be neccessary when you make the file executable and run it like this:

    ./webkit2png.py –xvfb [...]

    Please try this and tell me if it helps. If not, we can try to change the code so that it starts Xvfb by itself and kills the process before exit.

  5. Alex Ezell Says: 
    Februar 18th, 2009 at 17:13

    Thanks for taking time to look at it Roland. The part I changed is the part that handles starting Xvfb. This is what I have:

    if options.xvfb:         # Start 'xvfb' instance by replacing the current process         newArgs = ["xvfb-run", "-a", "--server-args=-screen 0 1024x768x24", "python"]         for i in range(0, len(sys.argv)):             if sys.argv[i] not in ["-x", "--xvfb"]:                 newArgs.append(sys.argv[i])         logging.debug("Executing %s" % " ".join(newArgs))         os.execvp(newArgs[0], newArgs)         raise RuntimeError("Failed to execute '%s'" % newArgs[0])

    I’ve added the “-a” and the “python” arguments to xvfb-run. If I don’t have “-a,” it will fail to start xvfb because it’s already running with that server id. If I don’t have “python” the command passed through xvfb-run is incorrect.

    I have tried it with your original script with no changes and Xvfb still doesn’t die. Perhaps, it’s some environment problem? I noticed that I have to use kill -9 to get the Xvfb process to die. A simple kill won’t work.

  6. Roland Says: 
    Februar 18th, 2009 at 19:53

    Alex, I’ll reply to you by mail. If we find a solution, I’ll update the article.

  7. Roland Says: 
    Februar 20th, 2009 at 13:54

    The bug described by Alex seems to be reported here:

    https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/294454

    This has to be fixed by the Ubuntu people. However, we’re testing a workaround that tries to determine the PID of the active xvfb-run-instance at the end of the script and then kill itself with signal 9:

    m = re.match(”.*xvfb-run\.(\d+).*”, os.environ['XAUTHORITY'])
    if m:
    os.kill(int(m.group(1)), 9)

    This code has to be injected near line 203, just before “sys.exit(0)”. It requires you to import the module “re” (regular expressions) at the beginning of the script.

    I will not add this to webkit2png.py as I’m really convinced that this bug has to be fixed in Ubuntu’s xvfb pacakge.

  8. Hubert Says: 
    M?rz 11th, 2009 at 02:06

    Hi,

    webkit2png.py always fails for me with “failed to load”:

    # ./webkit2png.py -x -o test.png –debug http://news.bbc.co.uk
    DEBUG:root:Executing xvfb-run –server-args=-screen 0, 640×480x24 ./webkit2png.py -o test.png –debug http://news.bbc.co.uk
    DEBUG:root:Initializing class WebkitRenderer
    DEBUG:root:render(http://news.bbc.co.uk, timeout=0)
    DEBUG:root:Processing result
    ERROR:root:Failed to load http://news.bbc.co.uk

    The simple version works fine, I have written a .sh wrapper for it.

    Although it seems to fail on some sites, e.g.:

    ./webkit2png.sh http://www.rbsdigital.com
    QPainter::begin: Paint device returned engine == 0, type: 3
    QPainter::renderHints: Painter must be active to set rendering hints
    [...]

    I’m using libqt4-webkit 4.4.3-2, python-qt4 4.4.2-4 on Debian 5.0.

  9. Roland Says: 
    M?rz 12th, 2009 at 11:59

    I can reproduce issue #2, although I ‘m very busy at the moment and will not be able to analyse this at the moment.

    Problem #1 works for me. Please try to modify the script near line 63 and report the results:

    self._page.mainFrame().load(QUrl(url))
    self.__loading = True
    while self.__loading:

    Keep in mind that this is Python, so don’t mix up the indentation.

  10. Roland Says: 
    M?rz 12th, 2009 at 12:12

    Update: Problem #2 is because the page does not report a “contentSize”, and the reason is that the site uses a frameset. You can override the contentSize with “–geometry WIDTH HEIGH”, but this results in an empty image. As I said I’ll have a look at this as soon as I’m not so busy anymore.

    If you want to hack this yourself: I assume that you have to define the geometry of self._page or self._page.mainFrame() at some point before the rendering.

  11. thomas Says: 
    M?rz 22nd, 2009 at 10:35

    big, big thanks for such solution. I run on exactly the same problem i was wondering how to solve it quickly. Thanks for a good start with that.

  12. zz Says: 
    April 1st, 2009 at 12:36

    shameless:p
    http://www.insecure.ws/2008/09/16/xserver-less-webpage-screenshot

  13. Roland Says: 
    April 1st, 2009 at 13:22

    @zz: Kang’s post is from September 16th, mine is from December. So if there is somebody to blame for stealing code it’s me, but I swear that I never saw Kang’s script earlier :-)

  14. Paul Says: 
    April 19th, 2009 at 12:44

    This works:

    __self = True  class WebkitRenderer(QObject):      # Initializes the QWebPage object and registers some slots     def __init__(self):         def __on_load_finished(result):             __self.__on_load_finished(result)         def __on_load_started():             __self.__on_load_started()          __self = self         logging.debug("Initializing class %s", self.__class__.__name__)         self._page = QWebPage()         self.connect(self._page, SIGNAL("loadFinished(bool)"), __on_load_finished)         self.connect(self._page, SIGNAL("loadStarted()"), __on_load_started) 
  15. VidJa Says: 
    Mai 4th, 2009 at 10:24

    Hi Roland,

    Thanks for this excellent piece of work. I integrated it in my Django based website. Somewhere in 2007 I had a version of khtml2png2 working, but after a switch to mod_wsgi and various server upgrades I couldn’t get it working anymore.

    I ran into some xvfb issues however. When running a test script on the command line of my server your script runs without error messages using –xvfb, but when I run it from the mod_wsgi environment it generates an error message: Xvfb failed to start.
    when running using –display :0.0 it works from the wsgi script, but with an error message:style cannot be used together with the GTK_Qt engine. Anyway the last one works for me.

    (Ubuntu 9.04)

    # testscript
    import os, sys, subprocess

    options=['webkit2png.py',
    '--display', ':0.0',
    '-g', '1024', '768',
    u'http://www.dpreview.com',
    '--scale','128','92',
    '-o','dpreview.png']

    p=subprocess.Popen(options,0)
    output,errors=p.communicate()

  16. Roland Says: 
    Mai 4th, 2009 at 11:39

    Hi VidJa,

    Thanks for this report. I’ll have a look at it later.

    Update: I think this is an issue of mod_wsgi. Sadly, xvfb-run does not provide some sort of –verbose flag. Can you run it with “strace” (by modifying webkit2html.py)?

    Maybe xvfb-run does not have the permission to write the authority-file? The man page says that this file is written to the directory defined by TMPDIR or /tmp.

    Another reason might be that the memory is limited by mod_wsgi.

  17. Ariya Says: 
    Juni 8th, 2009 at 08:36

    Check also similar Qt/C++ code I wrote some time ago:
    http://labs.trolltech.com/blogs/2008/11/03/thumbnail-preview-of-web-page/
    http://labs.trolltech.com/blogs/2009/01/15/capturing-web-pages/

  18. Jorge Pereira Says: 
    Juni 17th, 2009 at 13:01

    Hi everyone,

    Regarding the issue with Xvfb staying up, it’s enough to pass “-terminate” to the server args. So, line 154 would look like:
    newArgs = ["xvfb-run", "--server-args=-terminate -screen 0, 640x480x24", sys.argv[0]]

    However, xvfb-run is already trying to kill Xvfb, so using this will trigger a warning message from xvfb-run.

    An option to skip this message would be to skip xvfb-run (it’s just a simple shell script anyway) and call Xvfb directly. As for xvfb, one of the following could be done:
    - change xvfb-run to use -terminate instead of issuing a kill (recommended?)
    - change xvfb-run to use kill -9

    Regards,

  19. Anonymous Says: 
    Juli 20th, 2009 at 06:55

    For those of you who might be getting the error:
    “QPainter::begin: Paint device returned engine == 0, type: 3″

    There are a couple possible reasons:
    - The page is greater than 32,768 pixels (2^15 px) in any dimension (http://doc.trolltech.com/4.5/qpainter.html#limitations)
    - The page is framed and messing with the image dimensions.

    Hope this saves someone a massive headache.

  20. Rob Sanderson Says: 
    Juli 20th, 2009 at 17:27

    Is there an easy way to fire this multiple times from a single script? For example, a crawler that takes snapshots of all of the pages that it visits? Other than the obvious commands.getoutput() of course :)

    Many thanks!

  21. Adam Nelson Says: 
    August 6th, 2009 at 20:41

    Roland,

    Would you consider getting this script onto PyPI as well as GitHub, BitBucket, or Google Code?

    It’s the best script I’ve come across for this job and it would be great to see it built out by the community. If you don’t want to do you mind if I do? I’d like to use this in a few places and if it were available from PyPI it would be great.

    Cheers,
    Adam

  22. Roland Says: 
    August 10th, 2009 at 15:15

    At the moment I’m still to busy to package this for PyPI by myself, but I don’t mind if you do so!

  23. Marc Says: 
    August 18th, 2009 at 03:50

    This script ROCKS!

    I got this working finally and it renders great. Wish I could make it faster. I had this working on a Mac before and it was quite fast. Now running on Linux (yea!)…

    anyway, I can’t get Flash to render. Any ideas? I am pretty certain flash is installed on the server, but maybe need to put it somewhere.

  24. Charlie Clark Says: 
    September 3rd, 2009 at 16:50

    I’m caught between this and simply calling websnap or CutyCapt as a subprocess. Anyone struggling with xvfb-run might try adding -f to the command list as this stops xvfb complaing it can’t start the server.

  25. Cole Says: 
    September 8th, 2009 at 16:33

    I made some modifications to your script and thought I would share:http://pastie.org/609626
    And the diff: http://pastie.org/609631

    Added a simple networkAccessManager to handle bad ssl certificates (we use self-signed certs on some pages I wanted to thumbnail). It could easily be extended to do something more intelligent, but it works for us.

    Added another option for aspect ratio: crop. This renders the full page the same as expand, then crops to the desired size. This gives better results for short pages like google than setting the browser size and using ignore aspect ratio.

    If anyone knows how to do a higher quality resize in QT I would be interested to hear. It seems to be doing simple linear interpolation which gives very poor results especially for text.

  26. Bob Says: 
    September 12th, 2009 at 03:20

    Does anyone know how to get this to display Flash plugins?

    I’ve tried enabling plugins in the script and also using Adobe Flash 32bit and 64bit or swfdec and gnash. None of them seem to work.

  27. Adam Nelson Says: 
    September 18th, 2009 at 01:14

    As per Roland’s comment, I moved this to a public repository so people can collaborate on this.

    http://github.com/AdamN/python-webkit2png

    This includes Coles modifications.

    Feel free to make updates, fork, etc…

  28. Luay Says: 
    September 22nd, 2009 at 21:30

    Good Day everyone,

    thank you for your effort, the idea looks really nice.

    i will start a website soon, in which i need a snapshot functionality, so i landed in this page.

    my website as i got from the host, will be hosted on linux and supports python,

    MY PROBLEM :) is that i come from windows background with eperience in ASP, and little bit PHP (which i will use for the website).

    questions are:
    what are the pre requirments to use your project on linux host, python, and php support, (i read things about Qt but i dont know what is it).

    and the second question is: are there some steps how to setup this on the host and use it from within PHP.

    thank you very much and accept my best regards,
    Luay

  29. Roland Says: 
    September 24th, 2009 at 12:18

    Hi Luay,

    beside of Python you should have installed the “webkit” library of the qt package and the PyQt4 package for python. Beyond that you’ll need an X11-Server – “Xvfb” should be sufficient for a headless maschine.

    I suggest to use your distributions package management to install these dependencies. If you tell me what distribution you are using I might be able to tell you the package names.

    Qt is a library for GUI programming which comes with it’s own HTML rendering engine, webkit. Please have a look at Wikipedia for further information.

    Good luck!
    Roland

  30. Luay Says: 
    September 25th, 2009 at 19:38

    Hello Roland,

    thank you very much for the response, do you know a host name which supports such packages,

    i asked the host i suppose to host with, and they have absolutely no idee :)

    Thank you,
    Luay

  31. Roland Says: 
    September 26th, 2009 at 09:09

    Oh ok, I assumed you were running your own server. Sorry, I don’t think I can help you in that question.

  32. Luay Says: 
    September 26th, 2009 at 14:31

    Nevertheless, thank you very much

  33. Ruby On Rails Entwicklung Says: 
    September 26th, 2009 at 20:23

    I just released a ruby-package to generate thumbshots using your script:
    http://github.com/digineo/thumbshooter

  34. Adam Nelson Says: 
    September 30th, 2009 at 00:40

    @Luay http://webfaction.com has great support for Python stuff – you could try them.

  35. Ben Standefer Says: 
    Oktober 6th, 2009 at 02:23

    Roland,

    I am having the same exact issue as Hubert. Looks like something with the Debian install of Qt4 makes the simple script work, but webkit2png.py reports “Failed to load” messages on all pages. I debugged for about 2 hours, but I am not Qt expert, and I only got “Failed to load” messages, indefinite hanging, or blank renders.

    I documented on the github repo:
    http://github.com/AdamN/python-webkit2png/issues/#issue/2

    Nice work though, looks excellent!

    -Ben Standefer

  36. Loic Says: 
    Oktober 6th, 2009 at 12:08

    I got the same problem that Hubert reported in March :
    # ./webkit2png.py -x -o test.png –debug http://news.bbc.co.uk
    DEBUG:root:Executing xvfb-run –server-args=-screen 0, 640×480×24 ./webkit2png.py -o test.png –debug http://news.bbc.co.uk
    DEBUG:root:Initializing class WebkitRenderer
    DEBUG:root:render(http://news.bbc.co.uk, timeout=0)
    DEBUG:root:Processing result
    ERROR:root:Failed to load http://news.bbc.co.uk

    script version is from github.
    my python version is : Python 2.5.2

    le webkit2png-simple works fine.
    And I think i nailed the problem down to the callbacks not being called back….

    __on_load_started is never called, it seems…

    if i change
    - self.connect(self._page, SIGNAL(”loadStarted()”), self.__on_load_started)
    + self.connect(self._page, SIGNAL(”loadStarted()”), onLoadStarted)
    with :
    +def onLoadStarted():
    + print “load started”

    I get a nice log :
    DEBUG:root:Initializing class WebkitRenderer
    DEBUG:root:render(http://www.google.com, timeout=20)
    load started
    ERROR:root:Request timed out

    So, is it because my python is too old ?
    does object method-callbacks works ?

  37. Roland Says: 
    Oktober 7th, 2009 at 17:47

    Strange, two people reporting the same issue. Ben, what Python and Qt versions are you using?

  38. asdfa Says: 
    November 10th, 2009 at 23:09

    hi,

    thank you. i’m using the same approach. but i want the captured picture be exactly the size of the web page. if i use your approach, the screen shot will be the size of the web frame, and i often see the scroll bar, because the frame is smaller than the web page.

    so how can i make a screen shot of the entire web page?

    thanks.

  39. Roland Says: 
    November 11th, 2009 at 20:38

    I think this might be a problem with the size of the “virtual desktop”. Maybe I have a chance to spend more time with this script in the near future.

  40. mariuz Says: 
    November 23rd, 2009 at 17:04

    PyQt4.QtWebKit import *

    i have added self._page.settings().setAttribute(QWebSettings.PluginsEnabled, True)
    at line 43

    i want to load a page with flash content , shameless plug reea.net

    webkit2png.py –scale 200 200 -x -o reea.png –debug http://reea.net

    but seems i have an flash error and the image is quite empty

    Adobe Flash Player: gtk_clipboard_get(GDK_SELECTION_PRIMARY); failed. Trying to call gtk_init(0,0);
    Xlib: extension “RANDR” missing on display “:99.0″.

  41. lennart Says: 
    November 23rd, 2009 at 19:53

    >thank you. i’m using the same approach. but i want the
    >captured picture be exactly the size of the web page. if
    >i use your approach, the screen shot will be the size of
    >the web frame, and i often see the scroll bar, because
    >the frame is smaller than the web page.

    Regarding this. There seems to me to be a bug in the QWebPage::mainFrame()::contentSize method. When a page doesn’t contain any child frame the method works fine, but when the mainframe does contain child frames the mainFrame will not return the proper content size.

    Also i have the same issue with flash content not being renderered. I installed the 64bit alpha version of the flash player and the content gets initialized (ie i can see it downloads all the data associated with it) but never gets fully rendered.

  42. Benino Says: 
    Februar 10th, 2010 at 01:49

    Hi. This is in response to Cole’s question about resize quality. It was posted a while back, but others may have the same issue.

    Quality is greatly reduced when shrinking down the screenshots becuase it uses poor interpolation. If you add a parameter to the image.scaled() function call you can increase the resized screen capture quality.
    By default it uses the “Qt.FastTransformation” mode.
    If you manually set the the mode to “Qt.SmoothTransformation” you’ll get a much nicer looking image. I tested this with a script taking 15 screenshots and sizing them down to a max of 400×400 and maintaining the aspect ratio. I timed the script and it actually ran 1 second faster with the SmoothTransformation as compared to the FastTransformation. I’m sure this was only faster because of fluctuations in page load times, but obviously the processing time change from fast to smooth was negligible in processing 15 screen shots.

    Here’s my change:

    mode = Qt.SmoothTransformation

    image = image.scaled(options.scale[0], options.scale[1], ratio, mode)

  43. Jason Huggins Says: 
    Februar 28th, 2010 at 16:20

    I would love to include a version of this in Selenium, but the GPL prevents me from including it in the project. (Selenium is Apache2-licensed.) Would you be willing re-license webkit2png.py as MIT/BSD/Apache2?

  44. Roland Says: 
    M?rz 1st, 2010 at 10:29

    Sorry, but this is not possible due to the restrictions of PyQt4:

    Like Qt, PyQt v4 is available on all platforms under a variety of licenses including the GNU GPL (v2 and v3) and a commercial license. Unlike Qt, PyQt v4 is not available under the LGPL. You can purchase the commercial version of PyQt.

  45. Benino Says: 
    M?rz 2nd, 2010 at 21:39

    I’m having a problem when a website has a confirmation alert type window asking if you really want to leave the page (ok,cancel). The script just hangs. The screenshot is not saved and the script doesn’t terminate after the timeout or anything. I guess it is unable to close the browser because of the alert? Is there a way to deal with this situation?

  46. Riccardo Says: 
    M?rz 9th, 2010 at 11:50

    Hi, thanks to all. You did an excellent work. Unfortunely I have a problem with ssl certificate. I need to take screen shots of a web page witch can be viewed only with a ssl_certificate installed on the browser. How can I specify a certificate to open a web page? The script version on mac works properly. Anyone can help me? Thanks for help.

  47. Roland Says: 
    M?rz 9th, 2010 at 12:02

    Riccardo, this should not make a problem since _on_ssl_errors() (git-version of webkit2png.py) should accept every ssl certificate out there. If this is not the case, can you name an URL to test this behaviour?

  48. Riccardo Says: 
    M?rz 9th, 2010 at 12:16

    Hi Roland, thank you for your fast reply. I can give you link, but to see its content you need a valid certificate. I can see the page with my browser (after installing my personal certificate) but can’t get sceen shots. Here the command:
    riccardo@riccardo-vm2:~/Downloads$ python webkit2png.py -o n1.png –debug “https://sam-it-roc.cern.ch/nagios/cgi-bin/status.cgi?hostgroup=site-GRISU-COMETA-INFN-CT&style=detail”
    DEBUG:root:Initializing class WebkitRenderer
    DEBUG:root:render(https://sam-it-roc.cern.ch/nagios/cgi-bin/status.cgi?hostgroup=site-GRISU-COMETA-INFN-CT&style=detail, timeout=0)
    DEBUG:root:loading started
    DEBUG:root:loading finished with result False
    DEBUG:root:Processing result
    webkit2png.py:205: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
    logging.error(e.message)
    ERROR:root:Failed to load https://sam-it-roc.cern.ch/nagios/cgi-bin/status.cgi?hostgroup=site-GRISU-COMETA-INFN-CT&style=detail

    Thanks for help

  49. Roland Says: 
    M?rz 9th, 2010 at 13:38

    Sorry, I misunderstood you. I’ve searched for a while and found that QNetworkManager emits a Signal called “authenticationRequired”which has a QAuthenticator parameter, but that only handles username and password authentication.

    I would suggest to ask some Qt4 developers how this would be done in C++. Maybe this can be ported to python.

  50. Riccardo Says: 
    M?rz 10th, 2010 at 10:40

    Hi Roland, thanks again for your help. Searching the web I find pyside, where are present Qt written in python. What I don’t understand is at what lever(webpage,network) and where set the certificate. QSslSocket (http://www.pyside.org/docs/pyside/PySide/QtNetwork/QSslSocket.html#PySide.QtNetwork.QSslSocket.addCaCertificate) allow to specify certificate but how to connect to WebPage or to NetworkAccessManager? Another way could be is to call userAgentForUrl but it’s not clear for me how to use it (http://www.pyside.org/docs/pyside/PySide/QtWebKit/QWebPage.html#PySide.QtWebKit.QWebPage.userAgentForUrl).
    Any ideas?

  51. Roland Says: 
    M?rz 10th, 2010 at 11:38

    Sorry, I don’t have any. I have no experiences with client-side certificates, and at the moment I’m too busy to learn about it. As I said, if you could get a C example from some Qt guys how to use client-side certificates with QtWebkit I would be able to port that code into the python script.

    Pyside: Yesterday I’ve heard about this project for the first time. Adam and me are going to support this library some day (http://github.com/AdamN/python-webkit2png/issues#issue/5)

    Edit: I had a short view at QSslSocket. What you want is a “localCertificate”. A assume you have to create your own implementation of NetworkAccessManager that creates a QSslSocket with your local certificate and assign this manager to the QWebPage instance – but I’m not sure about this. Maybe there is a “global socket factory” or something like this that may be modified.

  52. nn5 Says: 
    M?rz 19th, 2010 at 22:27

    How can I make locale letters appear ok in generated images? Now I just got “boxes” insteat of special characters. Try to capture http://google.plfor example.

  53. Roland Says: 
    M?rz 20th, 2010 at 12:52

    Your local X Server (or xvfb) has to be able to render those characters. It seems that the system lacks of the requires fonts.

  54. Benino Says: 
    April 6th, 2010 at 21:52

    Just thought I’d see if anyone had a solution for this. When a webpage contains an alert or prompt window when the page loads, the screen capture script just hangs. It doesn’t time out if you set a timeout either. It hangs during the call:
    QCoreApplication.processEvents()

    The script then has to be shutdown manually.

  55. Grammar Girl Says: 
    April 11th, 2010 at 12:51

    Great article! One note: In the second paragraph, “into it’s framework” should be “into its framework” (possessive). Thanks!

    Roland: Thank you!

  56. nm Says: 
    April 11th, 2010 at 13:48

    wkhtmltopdf 0.9 can be obtained with a modified qt which doesn’t need to talk to an X server … thus you can avoid mucking around with xvfb.

    http://code.google.com/p/wkhtmltopdf/

    Just a thought! Then you can convert from there to PNG or whatever …

    ——NM

  57. Cybso Says: 
    April 11th, 2010 at 13:59

    Interesting… as wkhtmltopdf is using the same library I wonder how they do. Think it’s time to have a look at their code.

  58. Jamie Plenderleith Says: 
    April 11th, 2010 at 16:50

    You could also use the Mugurdy API –http://blog.mugurdy.com/post/2009/12/05/API-for-screenshots-of-webpages.aspx

  59. Marcus Bointon Says: 
    Juni 7th, 2010 at 13:37

    I tracked down one cause of the ‘Xvfb failed to start’ errors. The webkit2png.py –xvfb option requires two parameters for width and height, which I don’t think were needed in older versions of the script (it was hard-coded for 640×480). I’ve wrapped this in a PHP script that takes web params for URL and a bounding square size and uses graphicsmagick to resize the thumbnail smoothly, sharpen it a bit and convert to jpeg. Works great for me (you’ll need to wrap php tags around this as this blog filters them out):

    =======
    if (!array_key_exists(’url’, $_GET)) exit;
    if (!array_key_exists(’size’, $_GET)) $_GET['size']=320;

    $box = (integer)$_GET['size'].’x’.(integer)$_GET['size'];

    header(’Content-Type: image/jpeg’);
    header(’Content-Disposition: inline; filename=preview.jpg’);
    $command = ‘/usr/bin/python webkit2png.py –xvfb 640 480 –format png –aspect-ratio keep ‘.escapeshellarg($_GET[$
    $command .= ‘ | /usr/bin/gm convert – -size ‘.$box.’ -resize ‘.$box.’ -sharpen 1 -quality 95 profile “*” jpeg:-’;
    //echo $command;
    passthru($command);
    =======

    Thanks for a great script!

  60. Marcus Bointon Says: 
    Juni 7th, 2010 at 13:57

    Sorry, just realised one line got truncated in that script:

    if (!array_key_exists(’url’, $_GET)) exit;
    if (!array_key_exists(’size’, $_GET)) $_GET['size']=320;

    $box = (integer)$_GET['size'].’x’.(integer)$_GET['size'];

    header(’Content-Type: image/jpeg’);
    header(’Content-Disposition: inline; filename=preview.jpg’);
    $command = ‘/usr/bin/python webkit2png.py –xvfb 640 480 –format png –aspect-ratio keep ‘.escapeshellarg($_GET['url']);
    $command .= ‘ | /usr/bin/gm convert – -size ‘.$box.’ -resize ‘.$box.’ -sharpen 1 -quality 95 +profile “*” jpeg:-’;
    //echo $command;
    passthru($command);

  61. Christoph Burgdorfer Says: 
    Juni 7th, 2010 at 14:53

    Hi Marcus,

    I am trying to get your script to work but I think posting it here messes up not only line breaks but also dashes, commas, quotes etc.

    Would it be possible to post your script on pastebin.com ? … (make it permanent if possible)

    I’m getting lots of “webkit2png.py: error: incorrect number of arguments” errors and can’t figure out what I’m doing wrong.

    /usr/bin/python /home/christoph/webkit2png.py -x 640 480 -g 640 480 -o test.png -f png –aspect-ratio keep ‘http://www.yahoo.com’

    Thanks!

  62. Christoph Burgdorfer Says: 
    Juni 7th, 2010 at 15:22

    Also worth noting maybe in this context:

    xvfb-run –server-args=”-screen 0, 640×480x24″ python /home/christoph/webkit2png.py -x -g 640 480 -o test.pnghttp://www.yahoo.com

    gives me a:

    Xvfb failed to start

    and

    batman:/var/www/dev/cliscreenshot# xvfb-run –server-args=”-screen 0, 640×480x24″ python /home/christoph/webkit2png.py -g 640 480 -o test.png http://www.yahoo.com

    a

    ERROR:root:Failed to load http://www.yahoo.com

  63. Christoph Burgdorfer Says: 
    Juni 9th, 2010 at 12:43

    I’ve found out what the problem was! I didn’t download it from github but from this site (doh! :) )

    The renderer however has problems with displaying a bit more sophisticated CSS/Javascript. What’s the best way of improving this? I’m trying to run the script from a webserver.

  64. Jori Says: 
    Juli 15th, 2010 at 19:18

    Thanks for this script! Ended up installing AdamN’s release and it works fine.

    Is it normal under Xvfb that fonts render a bit funny? I mean that there’s extra space between letters and they look bigger than on my Mac. Here’s a sample from Yahoo UK:http://cl.ly/70474114adb64ff00cf7

  65. Nevio Says: 
    September 15th, 2010 at 13:05

    Hi.

    I have a little problem with your script… Whenever I turn on javascript plugin it keeps output debug lines. I don’t have debug enabled :/

    Can you please help. Thanks.

    This is my command:

    xvfb-run –server-args=”-screen 0, 680×480x24″ python /var/server/stayself.dev/public/wp-content/plugins/screener/lib/lib/Shot/Py/shot.py http://URL –xvfb 1024 0 –geometry 1024 0 –aspect-ratio ignore –wait 5 –timeout 5 –feature javascript -x 1024 0 –log /var/server/stayself.dev/public/data/shoots-01/../logs/shot-logs.txt –format=jpg

    It keeps outputting :

    ** (process:20001): DEBUG: NP_Initialize ** (process:20001): DEBUG: NP_Initialize succeeded ** (process:20001): DEBUG: NP_Initialize ** (process:20001): DEBUG: NP_Initialize succeeded ** (process:20001): DEBUG: NP_Initialize ** (process:20001): DEBUG: NP_Initialize succeeded ** (process:20001): DEBUG: NP_Initialize ** (process:20001): DEBUG: NP_Initialize succeeded

    prior image and I can’t find the way how to remove it :/

  66. Cybso Says: 
    September 15th, 2010 at 13:45

    Have you tried the current github version fromhttp://github.com/AdamN/python-webkit2png/? It writes error or debugging messages to STDERR and the resulting image to STDOUT. You can suppress the messages by appending “2>/dev/null” if you want.

    Additionally, the arguments “-x” and “–xvfb” are the same, and both should not be required when using xvfb-run explicitly.

  67. Glenn Says: 
    Oktober 7th, 2010 at 18:31

    This appears only to work on ubuntu? I am trying to run this on SLES 11sp1 and it doesn’t have the xvfb-run command. I tried copying that from RedHat but it doesn’t work, just gives the error:
    webkit2png.py: cannot connect to X server

    I’ve tried starting a Xvfb service before running the script but I get the same error.

    Is there a way to run the script without the xvfb-run command?

  68. Drazen Says: 
    April 22nd, 2011 at 10:38

    Great article. I took Adam Nelsons script and it works like a charm… EXCEPT flash.

    I installed debian package flashplugin-nonfree, added –feature plugins –window parameters.

    Debug says nothing about flash:
    DEBUG:webkit2png:Version 20091224, Python 2.5.2 (r252:60911, Jan 24 2010, 17:44:40)
    [GCC 4.3.2], Qt 4.4.3
    DEBUG:webkit2png:loading started
    DEBUG:webkit2png:loading finished with result True
    DEBUG:webkit2png:Processing result
    DEBUG:webkit2png:contentsSize: PyQt4.QtCore.QSize(682, 512)
    DEBUG:webkit2png:Waiting 5 seconds

    Do I have to put libflashplayer.so somewhere? Or change some config? Any help is wellcome.

  69. A name Says: 
    Juni 1st, 2011 at 07:09

    PhantomJS (and PyPhantomJS) makes this completely obsolete.

    http://code.google.com/p/phantomjs/

  70. Cybso Says: 
    Juni 1st, 2011 at 12:43

    Sounds very interesting, and I agree that this could be a much cleaner solution for most people. The only problem remains flash, because it is not rendered by webkit but drawn on the X window through the plugin.

  71. Steffen Says: 
    Juni 4th, 2011 at 17:27

    thank you @ a name this is a nice solution

  72. Ravi Says: 
    Juni 5th, 2011 at 13:21

    Thanks. This is already there in Qt docs but hidden :)
    Nice to see a detailed explanantion.

  73. philips Says: 
    Juni 21st, 2011 at 14:34

    It’s very useful, I’ve made a web scraping tool using qtwebkit and pyside too, hope can help others.

  74. Bo Says: 
    September 11th, 2011 at 23:47

    I have not used your script yet, I downloaded Paparazzi! but it does not grab email or finder windows, which is a huge disappointment. Will webkit2png capture finder or email windows? Thanks.

  75. Ank Says: 
    September 21st, 2011 at 15:54

    @’A name’ and @Steffen

    I actually use CutyCapt so far and am very keen to see a better implementation, but phantomjs fails on the exact same URLs as CutyCapt and I expect also the implementation in this blog.

    Here’s one example (I can provide more if you like):

    http://www.ameinfo.com/ar-211316.html

    Why do you feel phantomjs makes this obsolete?

  76. Take webpage screenshot from command line in Ubuntu Linux ? Binary Tides Says: 
    November 2nd, 2011 at 08:15

    [...] : 1. http://www.blogs.uni-osnabrueck.de/rotapken/2008/12/03/create-screenshots-of-a-web-page-using-python... 2. [...]

  77. A name Says: 
    November 3rd, 2011 at 00:42

    Because the project already implements QtWebKit with an API that allows you to control it via JavaScript. Anything that fails on PhantomJS is VERY likely to fail on this as well, because QtWebKit is the main browser still, so you might as well use a working solution. ;)

  78. Ben Says: 
    Dezember 5th, 2011 at 00:23

    Hi, I’m having an issues with this python script. I have Xvfb running, but when I run webkit2png I always get this output. Is there any way I can get more info about what failed? Thanks!

    python2.7 webkit2png.py -o test.png -t 10 –debughttp://www.google.com
    DEBUG:root:Initializing class WebkitRenderer
    DEBUG:root:render(http://www.google.com, timeout=10)
    DEBUG:root:Processing result
    ERROR:root:Failed to load http://www.google.com
    admin ~ # 5 XSELINUXs still allocated at reset
    SCREEN: 0 objects of 80 bytes = 0 total bytes 0 private allocs
    DEVICE: 4 objects of 64 bytes = 256 total bytes 0 private allocs
    CLIENT: 0 objects of 44 bytes = 0 total bytes 0 private allocs
    WINDOW: 0 objects of 16 bytes = 0 total bytes 0 private allocs
    PIXMAP: 1 objects of 8 bytes = 8 total bytes 0 private allocs
    GC: 0 objects of 44 bytes = 0 total bytes 0 private allocs
    CURSOR: 0 objects of 4 bytes = 0 total bytes 0 private allocs
    CURSOR_BITS: 0 objects of 4 bytes = 0 total bytes 0 private allocs
    DBE_WINDOW: 0 objects of 12 bytes = 0 total bytes 0 private allocs
    TOTAL: 5 objects, 264 bytes, 0 allocs
    4 DEVICEs still allocated at reset
    DEVICE: 4 objects of 64 bytes = 256 total bytes 0 private allocs
    CLIENT: 0 objects of 44 bytes = 0 total bytes 0 private allocs
    WINDOW: 0 objects of 16 bytes = 0 total bytes 0 private allocs
    PIXMAP: 1 objects of 8 bytes = 8 total bytes 0 private allocs
    GC: 0 objects of 44 bytes = 0 total bytes 0 private allocs
    CURSOR: 0 objects of 4 bytes = 0 total bytes 0 private allocs
    CURSOR_BITS: 0 objects of 4 bytes = 0 total bytes 0 private allocs
    DBE_WINDOW: 0 objects of 12 bytes = 0 total bytes 0 private allocs
    TOTAL: 5 objects, 264 bytes, 0 allocs
    1 PIXMAPs still allocated at reset
    PIXMAP: 1 objects of 8 bytes = 8 total bytes 0 private allocs
    GC: 0 objects of 44 bytes = 0 total bytes 0 private allocs
    CURSOR: 0 objects of 4 bytes = 0 total bytes 0 private allocs
    CURSOR_BITS: 0 objects of 4 bytes = 0 total bytes 0 private allocs
    DBE_WINDOW: 0 objects of 12 bytes = 0 total bytes 0 private allocs
    TOTAL: 1 objects, 8 bytes, 0 allocs
    [dix] Could not init font path element /usr/share/fonts/OTF/, removing from list!

  79. Writing a web bot in Python – Part 1 | Ako Kaman Says: 
    M?rz 2nd, 2012 at 00:05

    [...] headache. But the plus is that, you can do virtually anything with QtWebKit. For example you can use QtWebKit to create screenshots of your web pages! So, sky is the [...]

  评论这张
 
阅读(3454)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017