I have a CarComparison website that pulls feeds from other sites. One of the channels he pulls out is a site that allows people who post an ad to update it several times. Typically, car updates every 10-14 days.
In any case, the only access to their data I have is through the RSS feed, which I analyze and extract useful data from them. I get it every minute, and usually there are about 15 new cars.
There is no easy way when I am importing to find out if the car is already in the system. I am fixing the original identifier, so I can check it later.
The query that I run to join the tables is:
SELECT DISTINCT cc_detail.original_id, cc_detail.year, cc_detail.price, cc_detail.make, cc_detail.model, cc_detail.referrer_site, wposts . *
FROM cc_posts wposts
LEFT JOIN cc_posts_detail cc_detail ON ( wposts.ID = cc_detail.post_id )
WHERE 1 =1
AND (
cc_detail.year >1949
)
AND (
cc_detail.price >0
)
AND cc_detail.referrer_site = 'CarSiteX'
AND wposts.post_status = 'publish'
AND wposts.post_type = 'post'
AND wposts.post_date < NOW( )
AND cc_detail.year <=2011
AND wposts.post_title NOT LIKE 'Ac%'
AND cc_detail.make != ''
AND cc_detail.model != ''
AND (
cc_detail.price +0
) >100
AND (
wposts.post_date > "2011/01/02 "
)
ORDER BY cc_detail.original_id ASC
LIMIT 30 , 300
, , , original_id. CarSiteX , . original_id, , original_id cc_posts_detail?
, :
original_id year price make model referrer_site ID post_author post_date post_date_gmt post_content post_title post_excerpt post_status comment_status ping_status post_password post_name to_ping pinged post_modified post_modified_gmt post_content_filtered post_parent guid menu_order post_type post_mime_type comment_count
1143583 2000 2900 lexus is200 CarSitex 9633341 1 2011-01-19 05:34:01 2011-01-19 12:34:01 2000 Manual 2.0 Petrol 136k miles NCT 039 d 0... Lexus Is200 2000 publish open open lexus-is200-2000- 2011-01-19 05:34:01 2011-01-19 12:34:01 0 0 post 0
1149513 1997 2000 mitsubishi colt CarSitex 8978523 1 2011-01-05 12:26:01 2011-01-05 19:26:01 1600cc mivec twin cam 16valve. 175 bhp.Four br... Mitsubishi Colt 1997 publish open open mitsubishi-colt-1997- 2011-01-05 12:26:01 2011-01-05 19:26:01 0 0 post 0
1149513 1997 2000 mitsubishi colt CarSitex 9416296 1 2011-01-14 12:04:01 2011-01-14 19:04:01 1600cc mivec twin cam 16valve. 175 bhp.Four br... Mitsubishi Colt 1997 publish open open mitsubishi-colt-1997- 2011-01-14 12:04:01 2011-01-14 19:04:01 0 0 post 0
1156791 2004 5950 ford focus CarSitex 9163527 1 2011-01-08 10:04:01 2011-01-08 17:04:01 2004 FORD FOCUS 1.4 4 DOOR 78333 MILES NCT D 1... Ford Focus 2004 publish open open ford-focus-2004- 2011-01-08 10:04:01 2011-01-08 17:04:01 0 0 post 0
, mitsubishi, ....
, , , ... . !
cc_post_details:
id int(4)
referrer_site varchar(100)
original_id bigint(8)
dealer varchar(255)
make varchar(100)
model varchar(100)
colour varchar(100)
year varchar(8)
engine_size int(4)
mileage int(4)
price int(4)
location varchar(100)
fuel_type varchar(50)
body_type varchar(50)
transmission varchar(50)
doors int(4)
image_base_url varchar(255)
image_main text
image_thumb text
post_id int(4)
date_added datetime
underscore_beepbeep_pos int(11)
cc_posts
ID bigint(20)
post_author bigint(20)
post_date datetime
post_date_gmt datetime
post_content longtext
post_title text
post_excerpt text
post_status varchar(20)
comment_status varchar(20)
ping_status varchar(20)
post_password varchar(20)
post_name varchar(200)
to_ping text
pinged text
post_modified datetime
post_modified_gmt datetime
post_content_filtered text
post_parent bigint(20)
guid varchar(255)
menu_order int(11)
post_type varchar(20)
post_mime_type varchar(100)
comment_count bigint(20)