dieyushi's Blog

ATOM Rss

google注册过程分析

April 05 2013 , coding

更新

2013-5-23

已实效,现在页面已经变了,大家就看个方法吧,具体的大家可以自己再分析下。

前言

google账户的注册页面为https://accounts.google.com/NewAccount,看了下google账户的注册机制,发现所有的POST变量都可以找到,只有bgresponse不是页面中直接可以得到的。bgresponse是专门验证是不是bot的。使用了google的botguard技术,如果不能正确的发送这个值的话,google就会要求进行手机验证。这个值的获取方法如下。

程序

<html>
<body>

<script type="text/javascript" src="https://www.google.com/js/bg/JnBnYvDSdG9SCf318u50U-J5uM1u66XDgtBIu4U1f7s.js"></script>
<script>
        document.bg = new botguard.bg('');
        if (document.bg) {
            document.bg.invoke(function(response) {
                    document.write(response);
            });
        }

</script>
</body>
</html>

使用pywebkitgtk处理,可以获取到javascript解析后的页面,方法为:

#!/usr/bin/env python
import sys, thread
import gtk
import webkit
import warnings
from time import sleep
from optparse import OptionParser

warnings.filterwarnings('ignore')

class WebView(webkit.WebView):
    def get_html(self):
        self.execute_script('oldtitle=document.title;document.title=document.documentElement.innerHTML;')
        html = self.get_main_frame().get_title()
        self.execute_script('document.title=oldtitle;')
        return html

class Crawler(gtk.Window):
    def __init__(self, url, file):
        gtk.gdk.threads_init() # suggested by Nicholas Herriot for Ubuntu Koala
        gtk.Window.__init__(self)
        self._url = url
        self._file = file

    def crawl(self):
        view = WebView()
        view.open(self._url)
        view.connect('load-finished', self._finished_loading)
        self.add(view)
        gtk.main()

    def _finished_loading(self, view, frame):
        with open(self._file, 'w') as f:
            f.write(view.get_html())
        gtk.main_quit()

def main():
    options = get_cmd_options()
    crawler = Crawler(options.url, options.file)
    crawler.crawl()

def get_cmd_options():
    """
        gets and validates the input from the command line
    """
    usage = "usage: %prog [options] args"
    parser = OptionParser(usage)
    parser.add_option('-u', '--url', dest = 'url', help = 'URL to fetch data from')
    parser.add_option('-f', '--file', dest = 'file', help = 'Local file path to save data to')

    (options,args) = parser.parse_args()

    if not options.url:
        print 'You must specify an URL.',sys.argv[0],'--help for more details'
        exit(1)
    if not options.file:
        print 'You must specify a destination file.',sys.argv[0],'--help for more details'
        exit(1)

    return options

if __name__ == '__main__':
    main()

这样就可以得到bgresponse的值了。

comments powered by Disqus