Missing specification for items
I started scraping some pages and now I will promote it to production so it is necessary for me to make some testing for them. I have been working with scrapyrt to provide easy consumption of scrapers but when i tried to use scrapy-test after seeing the spider execution it shows me the next message:
============================= elapsed 9.31 seconds =============================
Missing specification for <class 'items.RegistrationItem'> [x2]
TestStats2: item_scraped_count: int:1 != str:I'm here!
TestStats2: item_scraped_count: int:1 != str:meow
TestStats2: some_missing_stat: missing
Missing specification for <class 'items.GeneralInfoItem'>
Missing specification for <class 'items.CommercialRegisterItem'>
Missing specification for <class 'items.EconomicActivitesItem'>
Missing specification for <class 'items.FacultiesItem'>
================ TestRuesAntiCaptchaSpider failed 1 field tests ================
================ TestRuesAntiCaptchaSpider failed 3 stat tests =================
============ TestRuesAntiCaptchaSpider failed 5 field coverage tests============
Also I had to change my scraper to return item objects and not a json as I was initially doing it. The spider runs perfectly with items and shows all the data scraped and loaded.
The structure of my project is:
├── externalSources
│ ├── generalLibraries
│ │ ├── Config
│ │ ├── __pycache__
│ │ │ └── constants.cpython-37.pyc
│ │ └── constants.py
│ ├── help
│ │ ├── Estándar\ de\ Codificación\ Python.pdf
│ │ ├── generalLibraries
│ │ └── rues
│ └── projects
│ └── ruesAntiCaptcha
│ ├── config
│ ├── ejemploSalida
│ │ └── ejemplo.txt
│ ├── libraries
│ │ ├── __pycache__
│ │ │ ├── constants.cpython-37.pyc
│ │ │ └── resolveReCaptcha.cpython-37.pyc
│ │ ├── constants.py
│ │ └── resolveReCaptcha.py
│ └── ruesAntiCaptcha
│ ├── logs
│ │ └── ruesAntiCaptcha
│ ├── rues_tests
│ │ ├── __init__.py
│ │ ├── __pycache__
│ │ │ ├── __init__.cpython-37.pyc
│ │ │ ├── items.cpython-37.pyc
│ │ │ ├── scrapytests.cpython-37.pyc
│ │ │ ├── spiders.cpython-37.pyc
│ │ │ └── stats.cpython-37.pyc
│ │ ├── items.py
│ │ ├── scrapytests.py
│ │ ├── spiders.py
│ │ └── stats.py
│ ├── scrapy.cfg
│ └── scrapyRuesAntiCaptcha
│ ├── __init__.py
│ ├── __pycache__
│ │ ├── __init__.cpython-37.pyc
│ │ └── items.cpython-37.pyc
│ ├── items.py
│ ├── logs
│ │ └── ruesAntiCaptcha
│ ├── middlewares.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders
│ ├── __init__.py
│ ├── __pycache__
│ │ ├── __init__.cpython-37.pyc
│ │ └── rues.cpython-37.pyc
│ └── rues.py
The code of scrapytests.py is:
# import test classes here
# test spiders
from rues_tests.spiders import *
# item tests
from rues_tests.items import *
# stats tests
from rues_tests.stats import *
# You can define any scrapy settings here
LOG_LEVEL = 'INFO'
and the code for items.py in testing folder is:
import sys
import os
from scrapytest.tests import Match, Type, MoreThan, Required
sys.path.append(os.path.join(os.path.dirname(__file__), '../'))
try:
from scrapyRuesAntiCaptcha.items import RegistrationItem, GeneralInfoItem, \
CommercialRegisterItem, FacultiesItem, EconomicActivitesItem, \
RegistrationItemLoader, GeneralInfoItemLoader, \
CommercialRegisterItemLoader, EconomicActivitesItemLoader, FacultiesItemLoader
except Exception as e:
print(e)
from scrapytest.spec import ItemSpec
"""
Item tests here are defined for every Item object crawler might return
"""
class TestGeneralInfoItem(ItemSpec):
# defining item that is being covered
try:
item = GeneralInfoItem()
except Exception as e:
print(e)
# defining field tests
companyName_test = Type(str), MoreThan(0)
companyInitials_test = Type(str)
commerceAuthority_test = Type(str), MoreThan(0)
companyID_test = Type(str), MoreThan(0)
# also supports methods!
def url_test(selfself, value: str):
if not value.startswith('http'):
return f'Invalid url: {value}'
return ''
class TestRegistrationItem(ItemSpec):
item = RegistrationItem()
Thanks for any help.