Scatter plot on a large amount of data

Say I have a large dataset (8500000X50). And I would like to tell you a graph of X (date) and Y (measurement that was taken on a specific day).

I could only get this: enter image description here

data_X = data['date_local']
data_Y = data['arithmetic_mean']
data_Y = data_Y.round(1)
data_Y = data_Y.astype(int)
data_X = data_X.astype(int)
sns.regplot(data_X, data_Y, data=data)
plt.show()

For some "same" issues that I found in Stackoverflow, I can shuffle my data or take, for example, 1000 random values ​​and build them. But how to implement it in such a way that each X (the date when certain measurements were made) corresponds to the actual (measurement Y).

+4
source share
1 answer

First answer your question:

pandas.DataFrame.sample, , regplot :

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
import numpy as np
import pandas as pd
import seaborn as sns

dates = pd.date_range('20080101', periods=10000, freq="D")
df = pd.DataFrame({"dates": dates, "data": np.random.randn(10000)})

dfSample = df.sample(1000) # This is the importante line
xdataSample, ydataSample = dfSample["dates"], dfSample["data"]

sns.regplot(x=mdates.date2num(xdataSample.astype(datetime)), y=ydataSample) 
plt.show()

regplot X- - , , .

, - :

- :


:

sns.jointplot, kind, docs:

: { "" | "reg" | "" | "kde" | "hex" },

.

, , , matplotlib hist2d, , . :

dates = pd.date_range('20080101', periods=10000, freq="D")
df = pd.DataFrame({"dates": dates, "data": np.random.randn(10000)})

xdata, ydata = df["dates"], df["data"]
sns.jointplot(x=mdates.date2num(xdata.astype(datetime)), y=ydata, kind="kde")

plt.show()

, :

+4

Source: https://habr.com/ru/post/1681407/


All Articles