A working query that joins two tables is perfect, except that I cannot get rid of duplicates

I have a CarComparison website that pulls feeds from other sites. One of the channels he pulls out is a site that allows people who post an ad to update it several times. Typically, car updates every 10-14 days.

In any case, the only access to their data I have is through the RSS feed, which I analyze and extract useful data from them. I get it every minute, and usually there are about 15 new cars.

There is no easy way when I am importing to find out if the car is already in the system. I am fixing the original identifier, so I can check it later.

The query that I run to join the tables is:

SELECT DISTINCT cc_detail.original_id, cc_detail.year, cc_detail.price, cc_detail.make, cc_detail.model, cc_detail.referrer_site, wposts . *
FROM cc_posts wposts
LEFT JOIN cc_posts_detail cc_detail ON ( wposts.ID = cc_detail.post_id )
WHERE 1 =1
AND (
cc_detail.year >1949
)
AND (
cc_detail.price >0
)
AND cc_detail.referrer_site = 'CarSiteX'
AND wposts.post_status = 'publish'
AND wposts.post_type = 'post'
AND wposts.post_date < NOW( )
AND cc_detail.year <=2011
AND wposts.post_title NOT LIKE 'Ac%'
AND cc_detail.make != ''
AND cc_detail.model != ''
AND (
cc_detail.price +0
) >100
AND (
wposts.post_date > "2011/01/02 "
)
ORDER BY cc_detail.original_id ASC
LIMIT 30 , 300

, , , original_id. CarSiteX , . original_id, , original_id cc_posts_detail?

, :

original_id  year  price  make  model  referrer_site  ID  post_author  post_date  post_date_gmt  post_content  post_title  post_excerpt  post_status  comment_status  ping_status  post_password  post_name  to_ping  pinged  post_modified  post_modified_gmt  post_content_filtered  post_parent  guid  menu_order  post_type  post_mime_type  comment_count
1143583  2000  2900  lexus  is200  CarSitex  9633341  1  2011-01-19 05:34:01  2011-01-19 12:34:01     2000 Manual 2.0 Petrol 136k miles NCT  039 d 0...  Lexus Is200 2000     publish  open  open     lexus-is200-2000-        2011-01-19 05:34:01  2011-01-19 12:34:01     0     0  post     0
1149513  1997  2000  mitsubishi  colt  CarSitex  8978523  1  2011-01-05 12:26:01  2011-01-05 19:26:01     1600cc mivec twin cam 16valve. 175 bhp.Four br...  Mitsubishi Colt 1997     publish  open  open     mitsubishi-colt-1997-        2011-01-05 12:26:01  2011-01-05 19:26:01     0     0  post     0
1149513  1997  2000  mitsubishi  colt  CarSitex  9416296  1  2011-01-14 12:04:01  2011-01-14 19:04:01     1600cc mivec twin cam 16valve. 175 bhp.Four br...  Mitsubishi Colt 1997     publish  open  open     mitsubishi-colt-1997-        2011-01-14 12:04:01  2011-01-14 19:04:01     0     0  post     0
1156791  2004  5950  ford  focus  CarSitex  9163527  1  2011-01-08 10:04:01  2011-01-08 17:04:01     2004 FORD FOCUS 1.4 4 DOOR 78333 MILES NCT D 1...  Ford Focus 2004     publish  open  open     ford-focus-2004-        2011-01-08 10:04:01  2011-01-08 17:04:01     0     0  post     0

, mitsubishi, ....

, , , ... . !

cc_post_details:

id  int(4) 
referrer_site   varchar(100) 
original_id     bigint(8) 
dealer  varchar(255) 
make    varchar(100) 
model   varchar(100) 
colour  varchar(100) 
year    varchar(8) 
engine_size     int(4) 
mileage     int(4) 
price   int(4) 
location    varchar(100) 
fuel_type   varchar(50) 
body_type   varchar(50) 
transmission    varchar(50) 
doors   int(4) 
image_base_url  varchar(255) 
image_main  text 
image_thumb     text 
post_id     int(4) 
date_added  datetime 
underscore_beepbeep_pos     int(11)

cc_posts

    ID  bigint(20) 
post_author     bigint(20) 
post_date   datetime 
post_date_gmt   datetime 
post_content    longtext 
post_title  text 
post_excerpt    text 
post_status     varchar(20) 
comment_status  varchar(20) 
ping_status     varchar(20) 
post_password   varchar(20) 
post_name   varchar(200) 
to_ping     text 
pinged  text 
post_modified   datetime 
post_modified_gmt   datetime 
post_content_filtered   text 
post_parent     bigint(20) 
guid    varchar(255) 
menu_order  int(11) 
post_type   varchar(20) 
post_mime_type  varchar(100) 
comment_count   bigint(20) 
+3
5

?

CREATE TRIGGER REMOVE_OLD
ON CC_POST_DETAILS
BEFORE INSERT
AS
DECLARE @O_ID BIGINT(8), @MAKE VARCHAR(100), @MODEL VARCHAR(100)
BEGIN

SELECT @O_ID = INSERTED.ORIGINAL_ID FROM INSERTED
SELECT @MAKE = INSERTED.MAKE FROM INSERTED
SELECT @MODEL = INSERTED.MODEL FROM INSERTED

DELETE * FROM CC_POST_DETAILS
WHERE CC_POST_DETAILS.ORIGINAL_ID = @O_ID
AND CC_POST_DETAILS.MAKE = @MAKE
AND CC_POST_DETAILS.MODEL = @MODEL

END

, , , . , , , .

+1

GROUP BY cc_detail.original_id

0

wposts. * select ( * bad). Distinct . wposts. * , , post_date, post_date_gmt, .

:

, (max, min, avg ..) .

- . , , original_id select. original_ids, .

0

. * post.id = latest.postid ( 1 last.original_id = original_id last.post_modified < post_modified )

.

0

. , ? , 1 !?

, . ur primary_id ?

0

Source: https://habr.com/ru/post/1789141/


All Articles