ADAM NAAMANI

Background Processing with RETS and Sidekiq May 06, 2020

Managing large quantities of real estate data is computationally intensive, and well-suited for background processing. The task involves importing thousands of MLS® listings into a Redis in-memory data structure store, using an open government API for geocoding, and association with other models, therefore a lot can go wrong, and it's important to isolate these functions according to the single responsibility principle and separation of concerns.

This is an attempt to find the optimal setup using Heroku Redis in regards to concurrency and pool size, while gracefully dealing with Timeout, 429 Too Many Requests, and ERR max number of clients reached errors. I've predominantly worked with two libraries that tie perfectly into Rails' ActiveJobResque and Sidekiq. My preference leans toward Sidekiq, not only for their sweet karate logo but the creator, who open-sourced the software and charged money for Pro features that allowed him to quit his job

"I've been working daily for the last 5 years as a solo entrepreneur, building as much value into my commercial products and automating my business as much as possible. It's time to take a vacation and enjoy my success for a few months — relax and enjoy life while the products sell themselves." – Mike Perham

Suffice it to say, that enthusiasm for software engineering and independence is reflected in the product, and it helps that he frequently answers questions on StackOverflow for when you run into issues (also a happy hour for support). Sidekiq has tight integration with ActiveJob, which has worked great so far, to varying degrees.

Jobs within a Job

This one took me a while to figure out. It doesn't make much sense to perform a request to a third-party API outside of the job only to pass it to a job. That request could Timeout, or respond with a 400, and is not the most effective way to use background processing as it was intended. I ended up creating one job that connects to the RETS client using Estately's RETS library, which loops over all the records and queues a new job for every row.

Connect to RETS client:

module Rets
extend ActiveSupport::Concern

def connect
retries = 5

@client = Rets::Client.new(
login_url: :endpoint,
username: :user,
password: :password,
version: 'RETS/1.5',
max_retries: retries
)
@client.login
rescue Timeout::Error => e
Rails.logger.error(e)
retry if retries.positive?
retries -= 1
end

def disconnect
@client.logout
end
end

Import records:

module Import
class ListingJob < ApplicationJob
queue_as :priority

before_perform :connect
after_perform :disconnect

sidekiq_options retry: 5

def perform(**args)
records = @client.find(
:all,
search_type: args[:search_type],
class: args[:property_class],
resolve: true
)

return if records.blank?

records.each do |record|
Insert::ListingJob.perform_later(record)
end
rescue StandardError => e
Rails.logger.error(e)
Raven.capture_exception(e)
end
end
end

Insert record:

module Insert
class ListingJob < ApplicationJob
queue_as :priority

def perform(record)
Listings::Create.call(record) if record.present?
end
end
end

Sidekiq then calls a Plain Old Ruby Object (PORO) service to handle the interaction with the database. The operation can be seen through Sidekiq's sleek dashboard:

Rails.application.routes.draw do
require 'sidekiq/web'
require 'sidekiq-scheduler/web'
mount Sidekiq::Web => '/sidekiq'
end

The jobs can be controlled through the UI, or programmatically through the Rails console, which makes it super easy to manage:

2.7.1 > queue = Sidekiq::Queue.new('priority')
2.7.1 > queue.each do |job|
2.7.1 > job.delete
2.7.1 > end