来自coursera上约翰霍普金斯大学Data Science系列课程Course4:Exploratory Data Analysis.
Princlples of Analytic Graphics
Show comparisons compared to what(PM25 in house with aircleaner compared to without aircleaner)
Show casuality, mechanism, explanation show how you believe the world works(show you believe child living in house with lower pm25 is more likely to be healthy)
Show multivariate Data more than 2 variables
Integrate multiple models of evidence don't let the tools drive the analysis(plot depend on your own idea, not the tools)
Describe and document the evidence
Content is king
Take a Look at the Data
1 2
library(datasets) data(airquality)
One dimension
Summary
1
summary(airquality$Ozone) # 臭氧
Boxplots
1 2
boxplot(airquality$Ozone, col = "blue") abline(h = 100)
model <- lm(Ozone ~ Wind, airquality) abline(model, lwd = 2)
Base Plotting Functions
Initialize
plot:initialize a new plot
hist:initialize a new hist
boxplot:initialize a new boxplot
Add
lines:add lines to a plot
abline:add lines to a plot
points:add points to a plot
text:add text labels to a plot using specified x, y coordinates
title:add annotations to x, y axis labels, title, subtitle, outer margin
mtext:add arbitrary text to the margins
axis:add axis labels
Some Important Base Graphics Parameters
pch:the plotting symbol
lty:the line type
lwd:the line width
col:color
xlab:string for the xlab
ylab:string for the ylab
las:the orientation of the axis
bg:thebackground color
mar:the margin size
oma:the outer margin size
mfrow:number of plots per row, column
mfcol:number of plots per row, column(differ in order)
Default parameters:
1 2
par("bg") # "transparent" par("mar") # 4 4 2 1
Examples
Example:Legend
1 2 3 4
with(airquality, plot(Wind, Ozone, main = "Ozone and Wind in New York", type = "n")) with(subset(airquality, Month == 5), points(Wind, Ozone, col = "blue")) with(subset(airquality, Month != 5), points(Wind, Ozone, col = "red")) legend("topright", pch = 1, col = c("blue", "red"), legend = c("May", "Other Months"))
Example:Multiple Base Plots
1 2 3 4 5 6 7
par(mfrow = c(1, 3), mar = c(4, 4, 2, 1), oma = c(0, 0, 2, 0)) with(airquality, { plot(Wind, Ozone, main = "Ozone and Wind") plot(Solar.R, Ozone, main = "Ozone and Solar Radiation") plot(Temp, Ozone, main = "Ozone and Temperature") mtext("Ozone and Weather in New York", outer = TRUE) })
The Lattice System
Lattice:Entire plot specified by one function
useful for plotting high dimensional data(conditioning plots)
different from base plot driectly to the graphics device, lattice plot returns an object of class trellis (and will be auto-printed)
representative:xyplot()
Two packages
lattice(including xyplot bwplot, levelplot, etc)
grid(usually indirectedly called through lattice or ggplot2)
Lattice Functions
xplot:create scatterplots
bwplot:box-and-whiskers plots
histogram:histograms
stripplot:like a boxplot but with actual points
dotplot:plot dots on "violin strings"
splom:scatterplot matrix(like pairs in base plotting)
levelplot, contourplot:for plotting "image" data
Examples
Example:xyplot
1 2 3 4
library(datasets) library(lattice) state <- data.frame(state.x77, region = state.region) xyplot(Life.Exp ~ Income | region, data = state, layout = c(4, 1))
Example:plane functiuon
1 2 3 4 5 6 7 8 9 10
library(lattice) set.seed(10) x <- rnorm(100) f <- rep(0:1, each = 50) y <- x + f- f * x + rnorm(100, sd = 0.5) f <- factor(f, labels = c("group1", "group2")) xyplot(y ~ x | f, panel = function(x, y, ...){ panel.xyplot(x, y, ...) panel.lmline(x, y, col = 2) })
The ggplot2 System
ggplot2:Mixed elements of Base and Lattice
book: In brief, thegrammar tells us that a statistical graphic is a mapping from data to aesthetic(美学) attributes(color, shape, size) of geometric objects (points, lines, bars).The plot may also contain statistical transformations of the data and is drawn on a specific corrdinate system.
representative:qplot(), ggplot()
Basic Components of a ggplot2 Plot
a data frame
aesthetic mapping:how data are mapped to color, size
geoms:points, lines, shapes
facets:for conditional plots
stats:binning(柱形分析), quantiles, smoothing
scales:for example:sex
corrdinate system
1 2 3 4 5
library(ggplot2) data(mpg) g <- ggplot(mpg, aes(displ, hwy)) summary(g) g + geom_point()