下scrapy-redis如何为不同的爬虫项目分配不同的db,而不仅仅使用db0

来源:转载

背景

redis默认会生成16个db:db0 ~ db15, 在编写scrapy-redis分布式爬虫时,会默认使用db0来存放去重,种子队列以及item数据。但是一般情况下,我们不会只有一个爬虫项目,如果都放到一个数据库中,很容易搞混。所以为不同的爬虫项目分配不同的db是一件很有必要的事情。


环境
系统:win7
scrapy-redis
redis 3.0.5
python 3.6.1
分析

首先我们来分析一下scrapy-redis源码,看看设置db的位置在哪里?


第一步: ./Lib/site-packages/scrapy_redis/scheduler.py


@classmethod
def from_settings(cls, settings):
kwargs = {
'persist': settings.getbool('SCHEDULER_PERSIST'),
'flush_on_start': settings.getbool('SCHEDULER_FLUSH_ON_START'),
'idle_before_close': settings.getint('SCHEDULER_IDLE_BEFORE_CLOSE'),
}
# If these values are missing, it means we want to use the defaults.
optional = {
# TODO: Use custom prefixes for this settings to note that are
# specific to scrapy-redis.
'queue_key': 'SCHEDULER_QUEUE_KEY',
'queue_cls': 'SCHEDULER_QUEUE_CLASS',
'dupefilter_key': 'SCHEDULER_DUPEFILTER_KEY',
# We use the default setting name to keep compatibility.
'dupefilter_cls': 'DUPEFILTER_CLASS',
'serializer': 'SCHEDULER_SERIALIZER',
}
for name, setting_name in optional.items():
val = settings.get(setting_name)
if val:
kwargs[name] = val
# Support serializer as a path to a module.
if isinstance(kwargs.get('serializer'), six.string_types):
kwargs['serializer'] = importlib.import_module(kwargs['serializer'])
# 初始化 redis server.
server = connection.from_settings(settings)
# Ensure the connection is working.
server.ping()
return cls(server=server, **kwargs)

会调用到 connection.py下的函数from_settings来初始化 Redis server




第二步: ./Lib/site-packages/scrapy_redis/connection.py


# Backwards compatible alias.
from_settings = get_redis_from_settings
def get_redis_from_settings(settings):
"""Returns a redis client instance from given Scrapy settings object.
This function uses ``get_client`` to instantiate the client and uses
``defaults.REDIS_PARAMS`` global as defaults values for the parameters. You
can override them using the ``REDIS_PARAMS`` setting.
Parameters
----------
settings : Settings
A scrapy settings object. See the supported settings below.
Returns
-------
server
Redis client instance.
Other Parameters
----------------
REDIS_URL : str, optional
Server connection URL.
REDIS_HOST : str, optional
Server host.
REDIS_PORT : str, optional
Server port.
REDIS_ENCODING : str, optional
Data encoding.
REDIS_PARAMS : dict, optional
Additional client parameters.
"""
params = defaults.REDIS_PARAMS.copy()
# 关键点就在这个位置,在这里,我们可以填入 redis自定义参数
params.update(settings.getdict('REDIS_PARAMS'))
# XXX: Deprecate REDIS_* settings.
for source, dest in SETTINGS_PARAMS_MAP.items():
val = settings.get(source)
if val:
params[dest] = val
# Allow ``redis_cls`` to be a path to a class.
if isinstance(params.get('redis_cls'), six.string_types):
params['redis_cls'] = load_object(params['redis_cls'])
return get_redis(**params)

如上述代码所示,会从settings的REDIS_PARAMS项中拿到参数,然后填入 get_redis(**params) 中,来初始化redis server,如下所示:


def get_redis(**kwargs):
"""Returns a redis client instance.
Parameters
----------
redis_cls : class, optional
Defaults to ``redis.StrictRedis``.
url : str, optional
If given, ``redis_cls.from_url`` is used to instantiate the class.
**kwargs
Extra parameters to be passed to the ``redis_cls`` class.
Returns
-------
server
Redis client instance.
"""
redis_cls = kwargs.pop('redis_cls', defaults.REDIS_CLS)
url = kwargs.pop('url', None)
if url:
return redis_cls.from_url(url, **kwargs)
else:
return redis_cls(**kwargs)


方法
通过上面的分析,做起来就很简单了,只要为爬虫配置好REDIS_PARAMS这个settings项就好了。
同理,设置password也是通过这种方式。
# 指定使用 db2
class MySpider(RedisSpider):
"""Spider that reads urls from redis queue (myspider:start_urls)."""
name = 'xxxx'
redis_key = 'xxxx:start_urls'
# ……
custom_settings = {
'LOG_LEVEL': 'DEBUG',
'DOWNLOAD_DELAY': 0,
# 指定redis数据库的连接参数
'REDIS_HOST': '192.168.1.99',
'REDIS_PORT': 6379,
# 指定 redis链接密码,和使用哪一个数据库
'REDIS_PARAMS' : {
'password': 'redisPasswordTest123456',
'db': 2
},
}

效果如下:


![输入图片说明](/2014th7cj/d/file/p/20180122/3ni3jr2cnut.png "在这里输入图片标题")
注意事项:
在修改数据库之后,添加start_urls,以及从redis往mongodb进行数据转储时,需要指定对相应的数据库,如下:
创建redis数据库连接

rediscli = redis.Redis(host = redis_Host, port = 6379, db = "2")


记录来自于Kosmoo:传送

分享给朋友:
您可能感兴趣的文章:
随机阅读: