Rho site logo

Rho Knows Clinical Research Services

A Hands-Free SAS Axis Macro

Posted by Brook White on Mon, Dec 01, 2014 @ 03:41 PM

When creating charts and graphs, it is easy to focus on the style and layout of the core data elements but overlook the layout of the axes.  It's an honest mistake, but the result can be painful to look at.  A great visualization can be ruined with poorly designed axes - a bit like wearing a tuxedo with sweatpants.  

While aesthetics are important, the real threat with poor axes is the potential for miscommunication.  Axes set the parameters of our visualization; they provide the context in which we interpret the data.  If they are not carefully designed, the data may be misleading or outright deceptive, as in the example below.

Example: Poor chart axes

poor chart axes

This chart is short on information but long on confusion.  Why does the y-axis begin at 50?  On this scale, the differences between groups appears more dramatic than it would on a scale that (more appropriately) begins at 0.  Where does the y-axis end?  Does column A run off the chart or end in the visible space?  This chart suffers from numerous shortcomings, but the poorly designed axes pose the most glaring challenges to our ability to interpret the data correctly.    

Given the pitfalls that come with designing effective axes, many charting programs try to automate axis settings to take the burden of set up off the user.  While this is well-intentioned and can work for very basic figures, we often find that the default settings require multiple modifications to create useful figures. 

Such is the case with two popular SAS graphing tools - SAS/GRAPH and ODS Graphics. There are many situations in which the default axes produced by SAS/GRAPH and ODS Graphics cannot be used as is. 

Perhaps there are too many or too few tick marks for our tastes. Perhaps we need the axis ranges to be identical across a set of graphs for cross-chart comparisons. Perhaps we have a reference value that we wish to display, but this value is outside of the observed data range. For these reasons and many more, we often need to modify our default axis ranges. 

Modifying axis ranges in SAS is typically an intensive manual process. First we have to review our graphs, decide what new axis ranges to use, and then type them into the programs. This is not just a one-time challenge, however. With each update to our data set, we run the risk of our previously-specified ranges being out of sync with the new data. Both the cost and risk associated with this manual process are unacceptable. 

To aid with axis creation in SAS, Center for Applied Data Visualization developed the %FigAxisOrder SAS macro to create axis ranges based on user-defined criteria. The macro scans the data to be graphed and produces recommended minimum, maximum, and increment values for the axis. The macro’s default behavior is to try to duplicate the axes that we would get out of ODS Graphics. The programmer can then customize the behavior of the macro further by specifying multiple variables to consider simultaneously, specific values that should be included in the axis range, a desired number of tick marks to display, and more. The end result is an axis range that is data-driven, user-configurable, and after the initial setup is maintenance-free. 

The %FigAxisOrder SAS macro and accompanying documentation are freely available here on Rho's public graphics-sharing website: graphics.rhoworld.com.

You can read more about Rho's Center for Applied Data Visualization  here

Introducing Rho's Center for Applied Data Visualization

Posted by Brook White on Fri, Nov 21, 2014 @ 10:02 AM

Our industry is driven by data. Every phase of our trials requires us to collect, monitor, analyze, and report data. While each of these steps is equally important, reporting is arguably the most impactful step. When we report data, we give them to key decision-makers and invite them to interpret the data and draw conclusions.
Is the trial being conducted correctly? Is participant enrollment on schedule? Are we protecting our participants' safety? Was the investigational product effective? Was our hypothesis confirmed? We rely on effective data reporting to answer these questions.
scatterUnfortunately, our industry doesn't always use the best tools or practices when it comes to data reporting. If you've ever had to make sense of 50 pages of data listings or spend hours creating figures using spreadsheet software, you know what we mean. If these methods feel outdated, it's because they are. We have been using the same basic technologies to report data for the past few decades with little improvement. The good news is that there are plenty of alternatives available to us and our industry is ripe for change.
Granted, some of the formats for reporting are mandated by formal regulations. We may not be able to do much about these reports, but many of the methods we use to report data are left up to us as clinical researchers. As such, we argue that clinical researchers have a beeswarmresponsibility to do the data justice and communicate them as clearly and effectively as possible.

What does this mean for our industry? It means looking for newer and better ways to communicate data. It means thinking carefully about how the method of reporting impacts perception and comprehension of data. It means researching novel technology tools for sharing data.
At Rho, we took these challenges to heart and created a new Center for Applied Data Visualization (ADV) to research and promote the best practices and tools for visualizing and reporting data. The ADV was founded by a team of senior biostatisticians, web programmers, and a study coordinator who have years of experience directly supporting sunburstclinical trials. This first hand experience with clinical research gave our team a unique perspective on the data reporting needs at all stages of clinical research from study design, to participant enrollment, monitoring, data collection, analysis, data exploration, to publication and reporting. Hence, the ADV marries clinical research experience with the technical skillset to create innovative, cutting-edge data visualizations in support of our research projects. Moreover, the ADV provides trainings throughout the company on graphics best practices and tool development.
In support of our projects, the ADV has developed dozens of novel graphics for both static reports and interactive web-based use. In both cases, the response from our clients and research partners has been overwhelmingly positive. Beginning this month, the ADV is expanding their focus to also provide resources external to Rho. Members of the ADV have been presenting their work and tools in public forums for years, but now we are moving toward releasing some of our tools, resources, and graphics research open source (free to use) on our new graphics sharing website: graphics.rhoworld.com. The site currently has two tools available, and additional graphics will be posted, and discussed here, on a regular basis.

Data visualization has tremendous potential to improve the way we communicate, understand, and interact with data. If you would like to learn more about Rho’s data visualization work, we would love to hear from you at: graphics@rhoworld.com

View "Visualizing Multivariate Data" Video

I Swarm, You Swarm, We All Swarm for Beeswarm (Plots)

Posted by Brook White on Fri, Nov 21, 2014 @ 10:02 AM

In a recent Blog Post, we introduced you to Rho's Center for Applied Data Visualization (ADV).

One of the ADV's goals is to share some of our data visualization tools open source online. The first tools we are releasing are a set of statistical programming packages that will create "beeswarm plots." We are providing 3 examples using a SAS macro and 2 R code examples that the ADV developed.

What is a beeswarm plot?

Imagine that you have a continuous data variable that you want to compare between two different treatment groups (e.g., change in blood glucose levels for group 1 vs. group 2). One way to plot these data would be to create a strip plot, like this:

strip plot

As you can see, one potential problem with a strip plot is that you could have very dense grouping of data points, leading to data points being plotted over top of one another on the chart and obscuring the data.

One technique to avoid this overplotting is to apply a random jittering effect, which uses an algorithm to randomly move the data points away from one another a little so you can better see the data, like so:

random jittering to address overplotting

This is better, but it's still not a great display. The random jitter effect does not guarantee that overplotting will be avoided, and it often results in some points being moved unnecessarily.

A beeswarm plot improves upon the random jittering approach to move data points the minimum distance away from one another to avoid overlays.

The result is a plot where you can see each distinct data point, like so:

beeswarm plot

It looks a bit like a friendly swarm of bees buzzing about their hive. More importantly, each data point is clearly visible and no data are obscured by overplotting.

Check out the plots and code, and other technical details at our public new graphics sharing website: graphics.rhoworld.com.

If you would like to learn more about Rho’s data visualization work, we would love to hear from you at: graphics@rhoworld.com.

Otherwise, stay tuned for more graphics releases to come.

Free Expert Consultation