Skip to Main Content
University of Southern Maine
Libraries & Learning

Data Sources: Home

A guide to data sources in many different fields.

Gap Minder

Data Terminology / Glossary


administrative data
Administrative data is data collected from official records.
aggregate data
Aggregate data is data that is representitive of a number of individual samples, usually produced by a statistical or mathematical proceedure (e.g. frequency counts).
A census is a survey that contacts every single entity (e.g. people or organizations) in the population.
Cross-sectional data is data gathered at a single point in time.
Data is raw input for statistical analysis.
Federal Information Processing Series (FIPS) codes
Federal Information Processing Series (FIPS) codes are standardized numeric or alphabetic codes used to ensure uniform identification of geographic entities through all United States government agencies. Now superceded by ANSI codes.
longitudinal data
Longitudinal data follows the same individuals for an extended period of time (e.g. months, years, or decades; typically a year or more).
Microdata is raw observations, (e.g. survey responses), and is thus data rather than statistics.
North American Industry Classification System (NAICS) codes
North American Industry Classifrcation System (NAICS) codes are used by the governments of the United States, Canada, and Mexico to identify industries for statistical purposes.
see sample.
panel data
see longitudinal data.
The population is the set all possible observations or samples.
Statistics are the results of a statistical analysis.
statistical analysis
Statistical analysis is the process or product of performing an analysis using techniques from the field of statistics.
Surveys are methods of gathering data from a sample of the population.
A sample is a small subset of the population.
unit of analysis
Unit of analysis is the type or level of phenomenon that the researcher is looking to study.
unit of observation
Unit of observation is the type or level of phenomenon about which the data was gathered.
see population.

Software Tools

Windows, Mac OS X, Linux. Anaconda License (3-clause BSD License).
Anaconda is a completely free Python distribution. It includes more than 300 of the most popular Python packages for science, math, engineering, and data analysis.
Availability: Free.
Mac OS X, Windows, Linux. GNU GPL.
An IDE for data science. Rodeo provides a local graphical user interface with python based on IPython/Jupyter Notebook. Depends on a python environment provided by a package like anaconda.
Availability: Free.
Windows, Mac OS X, Linux. Freeware/Attribution License.
Tabula is a tool for liberating data tables locked inside PDF files.
Availability: Free.
Windows, Mac OS X. Commercial.
Excel helps to organize numeric or text data in spreadsheets or workbooks, as well as reformating and rearranging it. Excel also provides complex analyses tools, including pivot-tables.
Availability: Excel 2013 is available on all student computers. It is available for purchase from the campus IT store. A number of Free Software clones are available, including gnumeric and LibreOffice Calc.
Open Refine
Windows, Mac OS X, Linux. [BSD License].
OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
Availability: Free.
KNIME Analytics Platform
Windows, Mac OS X, Linux. GNU GPL.
The KNIME Analytics Platform incorporates hundreds of processing nodes for data I/O, preprocessing and cleansing, modeling, analysis and data mining as well as various interactive views, such as scatter plots, parallel coordinates and others. It integrates all of the analysis modules of the well known Weka data mining environment and additional plugins allow R-scripts to be run, offering access to a vast library of statistical routines.
Availability: Free.
Windows, Mac OS X, Linux. GNU GPL.
Processing is a flexible software sketchbook and a language for learning how to code within the context of the visual arts. Also provides a foundation for interactive data visualization.
Availability: Free. Libraries for Javascript (JS) and Python exist and available from the site.
Windows, Mac OS X, Linux. GNU GPL.
R is an integrated suite of software facilities for data manipulation, calculation and graphical display. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.
Availability: Free.
Epi Info
Windows only. Unknown License..
Epi Info™, a suite of lightweight software tools, delivers core ad-hoc epidemiologic functionality without the complexity or expense of large, enterprise applications.
Availability: Free from CDC. Open source community edition available as well.
Windows, Virtual Machine (for Linux and Mac OS X). Commercial.
SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and more advanced options through the SAS programming language.
Availability: Available on all Windows-based student computers. A basic version (University Edition) is available for free, but requires registration and a virtual machine, such as Virtual Box. USM students/staff/faculty may purchase discounted license through IT.
SPSS Statistics
Windows, Linux, Mac OS X. Commercial.
A statistical analysis package used throughout the social sciences.
Availability: Installed on all Windows-based student computers. USM students/staff/faculty may purchase discounted license through IT. A Free Software clone, called PSPP, is available for Linux and WIndows.
Windows. Commercial.
Minitab features a complete set of statistical tools, including Descriptive Statistics, Hypothesis Tests, Confidence Intervals and Normality Tests. As well as advanced statistics tools, including regression and ANOVA. Minitab also allows for the discovery of the settings that optimize research processes using Factorial, Response Surface, Mixture and Taguchi experimental design methodologies.
Availability: Available on Windows-based student computers. USM students/staff/faculty may purchase discounted license through IT..
Windows. Commercial.
An advanced statistics package, comparable to Minitab. Provides an intuitive interface to complex statistical methods.
Availability: Version 12 is free to all UMS students/staff/faculty.
Windows. Commercial.
The de-facto standard commercial GIS. Provides advanced geographic data manipulation, analysis, and modeling tools; as well as map production facilities.
Availability: Available on all Windows-based student computers. USM students may acquire a one-year free license from the GIS Lab..
Windows, Mac OS X, Linux. GNU GPL.
GeoDa is a software tool for exploratory spatial data analysis (ESDA). It is intended to provide a user friendly and graphical interface to methods of descriptive spatial data analysis, such as autocorrelation statistics and indicators of spatial outliers.
Availability: Free.
Windows, Linux, Mac OS X. GNU GPL.
Open Source desktop GIS. Provides tools for spatial data manipulation, and analysis as well as modeling tools.Map production facilities are basic.
Availability: Free.
Google Earth Pro
Windows, Mac OS X, Linux. Free.
Google's well-known model earth / digital globe. It can be used as a spatial data creation and presentation tool.
Availability: Pro Version is available on all student computers. Also available for download..
SNAP Toolboxes BEAM Visat
Windows, Mac OS X, Linux. GNU GPL.
A desktop application to be used for visualization, analyzing and processing of remote sensing raster data (satellite imagery).
Availability: Free.
Web-Based. Freemium.
Provides one of the best out-of-the-box online spatial visualization solutions.
Availability: Free*.

Related Guides

Ask A Librarian

Suggest A Data Source