scrapy 试用

Posted by abloz on October 31, 2012

周海汉 上一篇文章讲了《scrapy 安装》,解决了openssl编译不通过的问题。本篇对scrapy进行试用。

[zhouhh@Hadoop48 python]$ scrapy startproject test

[zhouhh@Hadoop48 test]$ find .

[zhouhh@Hadoop48 test]$ cat test/spiders/
from scrapy.spider import BaseSpider

class TestSpider(BaseSpider):
    name = "hadoop48"
    allowed_domains = ["hadoop48"]
    start_urls = [

    def parse(self, response):
        filename = response.url.split("/")[-2]
        open(filename, 'wb').write(response.body)

[zhouhh@Hadoop48 test]$ cat test/

from scrapy.item import Item, Field

class TestItem(Item):
    # define the fields for your item here like:
    title = Field()
    link = Field()
    desc = Field()

[zhouhh@Hadoop48 test]$ scrapy crawl test
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
File "/usr/local/lib/python2.7/site-packages/scrapy/", line 96, in execute
settings = get_project_settings()
File "/usr/local/lib/python2.7/site-packages/scrapy/utils/", line 56, in get_project_settings
settings_module = __import__(settings_module_path, {}, {}, [''])
ImportError: No module named settings


[zhouhh@Hadoop48 python]$ scrapy startproject test1


[zhouhh@Hadoop48 test1]$ scrapy crawl test1

KeyError: ‘Spider not found: test1’


将class Test1Spider 的name由hadoop48改为test1

[zhouhh@Hadoop48 test1]$ scrapy crawl test1 2012-10-31 13:49:24+0800 [scrapy] INFO: Scrapy 0.16.1 started (bot: test1)

2012-10-31 13:49:24+0800 [test1] INFO: Spider closed (finished)


[zhouhh@Hadoop48 test1]$ ls hadoop48 scrapy.cfg test1 [zhouhh@Hadoop48 test1]$ cat hadoop48

list tables demo of zhouhh 获取全部表名
167094287 10.28 json详单
100004458 10.29表格

参考: 官网: 教学: 简介: