Skip to content

Missing specification for items

I started scraping some pages and now I will promote it to production so it is necessary for me to make some testing for them. I have been working with scrapyrt to provide easy consumption of scrapers but when i tried to use scrapy-test after seeing the spider execution it shows me the next message:

============================= elapsed 9.31 seconds =============================
Missing specification for <class 'items.RegistrationItem'> [x2]
TestStats2: item_scraped_count: int:1 != str:I'm here!
TestStats2: item_scraped_count: int:1 != str:meow
TestStats2: some_missing_stat: missing
Missing specification for <class 'items.GeneralInfoItem'>
Missing specification for <class 'items.CommercialRegisterItem'>
Missing specification for <class 'items.EconomicActivitesItem'>
Missing specification for <class 'items.FacultiesItem'>
================ TestRuesAntiCaptchaSpider failed 1 field tests ================
================ TestRuesAntiCaptchaSpider failed 3 stat tests =================
============ TestRuesAntiCaptchaSpider failed 5 field coverage tests============ 

Also I had to change my scraper to return item objects and not a json as I was initially doing it. The spider runs perfectly with items and shows all the data scraped and loaded.

The structure of my project is:

├── externalSources
│   ├── generalLibraries
│   │   ├── Config
│   │   ├── __pycache__
│   │   │   └── constants.cpython-37.pyc
│   │   └── constants.py
│   ├── help
│   │   ├── Estándar\ de\ Codificación\ Python.pdf
│   │   ├── generalLibraries
│   │   └── rues
│   └── projects
│       └── ruesAntiCaptcha
│           ├── config
│           ├── ejemploSalida
│           │   └── ejemplo.txt
│           ├── libraries
│           │   ├── __pycache__
│           │   │   ├── constants.cpython-37.pyc
│           │   │   └── resolveReCaptcha.cpython-37.pyc
│           │   ├── constants.py
│           │   └── resolveReCaptcha.py
│           └── ruesAntiCaptcha
│               ├── logs
│               │   └── ruesAntiCaptcha
│               ├── rues_tests
│               │   ├── __init__.py
│               │   ├── __pycache__
│               │   │   ├── __init__.cpython-37.pyc
│               │   │   ├── items.cpython-37.pyc
│               │   │   ├── scrapytests.cpython-37.pyc
│               │   │   ├── spiders.cpython-37.pyc
│               │   │   └── stats.cpython-37.pyc
│               │   ├── items.py
│               │   ├── scrapytests.py
│               │   ├── spiders.py
│               │   └── stats.py
│               ├── scrapy.cfg
│               └── scrapyRuesAntiCaptcha
│                   ├── __init__.py
│                   ├── __pycache__
│                   │   ├── __init__.cpython-37.pyc
│                   │   └── items.cpython-37.pyc
│                   ├── items.py
│                   ├── logs
│                   │   └── ruesAntiCaptcha
│                   ├── middlewares.py
│                   ├── pipelines.py
│                   ├── settings.py
│                   └── spiders
│                       ├── __init__.py
│                       ├── __pycache__
│                       │   ├── __init__.cpython-37.pyc
│                       │   └── rues.cpython-37.pyc
│                       └── rues.py

The code of scrapytests.py is:

# import test classes here
# test spiders
from rues_tests.spiders import *
# item tests
from rues_tests.items import *
# stats tests
from rues_tests.stats import *

# You can define any scrapy settings here
LOG_LEVEL = 'INFO'

and the code for items.py in testing folder is:

import sys
import os

from scrapytest.tests import Match, Type, MoreThan, Required

sys.path.append(os.path.join(os.path.dirname(__file__), '../'))

try:
    from scrapyRuesAntiCaptcha.items import RegistrationItem, GeneralInfoItem, \
    CommercialRegisterItem, FacultiesItem, EconomicActivitesItem, \
    RegistrationItemLoader, GeneralInfoItemLoader, \
    CommercialRegisterItemLoader, EconomicActivitesItemLoader, FacultiesItemLoader
except Exception as e:
    print(e)


from scrapytest.spec import ItemSpec

"""
Item tests here are defined for every Item object crawler might return
"""
class TestGeneralInfoItem(ItemSpec):
    # defining item that is being covered
    try:
        item = GeneralInfoItem()
    except Exception as e:
        print(e)
    # defining field tests
    companyName_test = Type(str), MoreThan(0)
    companyInitials_test = Type(str)
    commerceAuthority_test = Type(str), MoreThan(0)
    companyID_test = Type(str), MoreThan(0)

    # also supports methods!
    def url_test(selfself, value: str):
        if not value.startswith('http'):
            return f'Invalid url: {value}'
        return ''

class TestRegistrationItem(ItemSpec):
    item = RegistrationItem()

Thanks for any help.