Unwanted items on the site

I have a problem when using a subset of a data.framein R.

A subset is created and displayed correctly, but when I try to build it with qplot(), rows that have not been selected subset()are also displayed along the same axis.

The actual file that I am reading is the web server log, but I created a small example to illustrate my problem:

This is the file ITEMSSOLD.CSVI read:

CUST,DT,ITEM,PRICE
BigJoe,10/13/2010,Pickup Truck,20000
TightWad,10/13/2010,USB Drive,12
Jane,10/13/2010,Smart Car,30000
Scrooge,10/13/2010,Gumdrops,1
GeekyMan,10/13/2010,Smart Car,30000

I read this in a data frame as follows:

sales_df <- read.table("C:/R_Expt/ItemsSold.csv", header=TRUE, sep=",")

Then I made a subset to get elements with high bytes as follows:

big_sales_df <- subset(sales_df, PRICE>100)

big_sales_df

big_sales_df
      CUST         DT         ITEM PRICE
1   BigJoe 10/13/2010 Pickup Truck 20000
3     Jane 10/13/2010    Smart Car 30000
5 GeekyMan 10/13/2010    Smart Car 30000

So it looks ok.

When I try to build it through qplotas follows:

qplot(nrow, ITEM, data = ddply(big_sales_df, .(ITEM), "nrow"))

ITEMS Y, Smart Car.

ddply() :

ddply(big_sales_df, .(ITEM), "nrow")
          ITEM nrow
1 Pickup Truck    1
2    Smart Car    2

ITEM, , - , , qplot() - Y .

sqldf():

qplot(NSOLD, ITEM, data = sqldf('select ITEM, count(*) as NSOLD from big_sales_df group by ITEM order by count(*) desc'))

.

, subset() - , .

subset(), ?

subset(), ?

, , subset() CSV , data.frame, , .

R !

+3
3

, , , . , , . 5 .

unique(diamonds$cut) ## Ideal, Premium, Good, Very Good, Fair

, :

str(subset(diamonds, cut == "Ideal")) ## Look at structure

str() , , .

$ cut    : Factor w/ 5 levels "Fair","Good",..: 5 5 5 5 5 5 5 5 5 5 ...

, , .

.

x$cut <- factor(x$cut, labels=unique(x$cut))

:

test <- ddply(big_sales_df, .(ITEM), "nrow")
test$ITEM <- factor(test$ITEM, labels=unique(test$ITEM))

.

+1

, .

big_sales_df$ITEM <- factor(big_sales_df$ITEM)
big_sales_df$CUST <- factor(big_sales_df$CUST)

, :

sales_df <- read.csv("ItemsSold.csv", header=TRUE, stringsAsFactors=FALSE)
+1

Or you can cheat on factoring:

qplot(nrow, factor(ITEM), data = ddply(big_sales_df, .(ITEM), "nrow")
+1
source

Source: https://habr.com/ru/post/1770812/


All Articles