Administrative data is data collected from official records.
aggregate data
Aggregate data is data that is representitive of a number of individual samples, usually produced by a statistical or mathematical proceedure (e.g. frequency counts).
census
A census is a survey that contacts every single entity (e.g. people or organizations) in the population.
cross-sectional
Cross-sectional data is data gathered at a single point in time.
data
Data is raw input for statistical analysis.
Federal Information Processing Series (FIPS) codes
Federal Information Processing Series (FIPS) codes are standardized numeric or alphabetic codes used to ensure uniform identification of geographic entities through all United States government agencies. Now superceded by ANSI codes.
longitudinal data
Longitudinal data follows the same individuals for an extended period of time (e.g. months, years, or decades; typically a year or more).
microdata
Microdata is raw observations, (e.g. survey responses), and is thus data rather than statistics.
North American Industry Classification System (NAICS) codes
North American Industry Classifrcation System (NAICS) codes are used by the governments of the United States, Canada, and Mexico to identify industries for statistical purposes.
Windows, Mac OS X, Linux. Anaconda License (3-clause BSD License). Anaconda is a completely free Python distribution. It includes more than 300 of the most popular Python packages for science, math, engineering, and data analysis. URL:https://www.continuum.io/why-anaconda Availability: Free.
Mac OS X, Windows, Linux. GNU GPL. An IDE for data science. Rodeo provides a local graphical user interface with python based on IPython/Jupyter Notebook. Depends on a python environment provided by a package like anaconda. URL:http://www.yhat.com/products/rodeo Availability: Free.
Windows, Mac OS X, Linux. Freeware/Attribution License. Tabula is a tool for liberating data tables locked inside PDF files. URL:http://tabula.technology/ Availability: Free.
Windows, Mac OS X. Commercial. Excel helps to organize numeric or text data in spreadsheets or workbooks, as well as reformating and rearranging it. Excel also provides complex analyses tools, including pivot-tables. URL:http://products.office.com/en-us/excel Availability: Excel 2013 is available on all student computers. It is available for purchase from the campus IT store. A number of Free Software clones are available, including gnumeric and LibreOffice Calc.
Windows, Mac OS X, Linux. [BSD License]. OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. URL:http://openrefine.org Availability: Free.
Windows, Mac OS X, Linux. GNU GPL. The KNIME Analytics Platform incorporates hundreds of processing nodes for data I/O, preprocessing and cleansing, modeling, analysis and data mining as well as various interactive views, such as scatter plots, parallel coordinates and others. It integrates all of the analysis modules of the well known Weka data mining environment and additional plugins allow R-scripts to be run, offering access to a vast library of statistical routines. URL:http://tech.knime.org/knime Availability: Free.
Windows, Mac OS X, Linux. GNU GPL. Processing is a flexible software sketchbook and a language for learning how to code within the context of the visual arts. Also provides a foundation for interactive data visualization. URL:http://processing.org/ Availability: Free. Libraries for Javascript (JS) and Python exist and available from the processing.org site.
Windows, Mac OS X, Linux. GNU GPL.
R is an integrated suite of software facilities for data manipulation, calculation and graphical display. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. URL:http://www.r-project.org Availability: Free.
Windows only. Unknown License.. Epi Info™, a suite of lightweight software tools, delivers core ad-hoc epidemiologic functionality without the complexity or expense of large, enterprise applications. URL:http://wwwn.cdc.gov/epiinfo/ Availability: Free from CDC. Open source community edition available as well.
Windows, Virtual Machine (for Linux and Mac OS X). Commercial. SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and more advanced options through the SAS programming language. URL:http://www.sas.com Availability: Available on all Windows-based student computers. A basic version (University Edition) is available for free, but requires registration and a virtual machine, such as Virtual Box. USM students/staff/faculty may purchase discounted license through IT.
Windows, Linux, Mac OS X. Commercial. A statistical analysis package used throughout the social sciences. URL:http://www.ibm.com/software/analytics/spss/ Availability: Installed on all Windows-based student computers. USM students/staff/faculty may purchase discounted license through IT. A Free Software clone, called PSPP, is available for Linux and WIndows.
Windows. Commercial. Minitab features a complete set of statistical tools, including Descriptive Statistics, Hypothesis Tests, Confidence Intervals and Normality Tests. As well as advanced statistics tools, including regression and ANOVA. Minitab also allows for the discovery of the settings that optimize research processes using Factorial, Response Surface, Mixture and Taguchi experimental design methodologies. URL:https://www.minitab.com/en-us/products/minitab/ Availability: Available on Windows-based student computers. USM students/staff/faculty may purchase discounted license through IT..
Windows. Commercial. An advanced statistics package, comparable to Minitab. Provides an intuitive interface to complex statistical methods. URL:http://www.systat.com/ Availability: Version 12 is free to all UMS students/staff/faculty.
Windows. Commercial. The de-facto standard commercial GIS. Provides advanced geographic data manipulation, analysis, and modeling tools; as well as map production facilities. URL:http://www.arcgis.com/ Availability: Available on all Windows-based student computers. USM students may acquire a one-year free license from the GIS Lab..
Windows, Mac OS X, Linux. GNU GPL. GeoDa is a software tool for exploratory spatial data analysis (ESDA). It is intended to provide a user friendly and graphical interface to methods of descriptive spatial data analysis, such as autocorrelation statistics and indicators of spatial outliers. URL:https://geodacenter.asu.edu/software Availability: Free.
Windows, Linux, Mac OS X. GNU GPL. Open Source desktop GIS. Provides tools for spatial data manipulation, and analysis as well as modeling tools.Map production facilities are basic. URL:http://www.qgis.org Availability: Free.
Windows, Mac OS X, Linux. Free. Google's well-known model earth / digital globe. It can be used as a spatial data creation and presentation tool. URL:http://www.google.com/earth/ Availability: Pro Version is available on all student computers. Also available for download..