I have a web scraper in Scrapy that receives data items. I want to insert them asynchronously into a database.
For example, I have a transaction that inserts some elements into my db using SQLAlchemy Core:
def process_item(self, item, spider): with self.connection.begin() as conn: conn.execute(insert(table1).values(item['part1']) conn.execute(insert(table2).values(item['part2'])
I understand that you can use SQLAlchemy Core asynchronously with Twisted with alchimia . Sample documentation code for alchimia below.
I do not understand how I can use my above code in the alchimia structure. How can I configure process_item to use a reactor?
Can I do something like this?
@inlineCallbacks def process_item(self, item, spider): with self.connection.begin() as conn: yield conn.execute(insert(table1).values(item['part1']) yield conn.execute(insert(table2).values(item['part2'])
How to write a part of the reactor?
Or is there an easier way to do non-blocking database inserts in the Scrapy pipeline?
For reference, here is a sample code from the alchimia documentation:
from alchimia import TWISTED_STRATEGY from sqlalchemy import ( create_engine, MetaData, Table, Column, Integer, String ) from sqlalchemy.schema import CreateTable from twisted.internet.defer import inlineCallbacks from twisted.internet.task import react @inlineCallbacks def main(reactor): engine = create_engine( "sqlite://", reactor=reactor, strategy=TWISTED_STRATEGY ) metadata = MetaData() users = Table("users", metadata, Column("id", Integer(), primary_key=True), Column("name", String()), )