It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
*Data Analytics: Data Analytics
A guide to some of the resources available to USM students, faculty, and staff with regards to data analytics.
Introduction to Data Analytics
Data analytics is about examining raw data with the purpose of drawing inferences and conclusions from it. Data analytics can be used to verify or disprove existing models or theories or to improve outcomes by allowing companies and organizations to make better business decisions.
There are several ways to approach using data analytics, for example one is to identify an issue or problem that needs to be addressed and approach the data seeking insights. Another way in which to use data analytics is a data-first approach. This involves taking the available data and exploring it for new insights, such as how to optimize a process, or identify a pattern that can be used to predict future outcomes.
Data analytics is a part of the rapidly expanding field of Data Science. Here is a guide to some of the resources available to USM students, faculty, and staff with regards to data analytics. It is a starting point, not an end point....
Windows, Mac OS X, Linux. Anaconda License (3-clause BSD License). Anaconda is a completely free Python distribution. It includes more than 300 of the most popular Python packages for science, math, engineering, and data analysis. URL:https://www.continuum.io/why-anaconda Availability: Free.
Mac OS X, Windows, Linux. GNU GPL. An IDE for data science. Rodeo provides a local graphical user interface with python based on IPython/Jupyter Notebook. Depends on a python environment provided by a package like anaconda. URL:http://www.yhat.com/products/rodeo Availability: Free.
Windows, Mac OS X. Commercial. Excel helps to organize numeric or text data in spreadsheets or workbooks, as well as reformating and rearranging it. Excel also provides complex analyses tools, including pivot-tables. URL:http://products.office.com/en-us/excel Availability: Excel 2013 is available on all student computers. It is available for purchase from the campus IT store. A number of Free Software clones are available, including gnumeric and LibreOffice Calc.
Windows, Mac OS X, Linux. [BSD License]. OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. URL:http://openrefine.org Availability: Free.
Windows, Mac OS X, Linux. GNU GPL. The KNIME Analytics Platform incorporates hundreds of processing nodes for data I/O, preprocessing and cleansing, modeling, analysis and data mining as well as various interactive views, such as scatter plots, parallel coordinates and others. It integrates all of the analysis modules of the well known Weka data mining environment and additional plugins allow R-scripts to be run, offering access to a vast library of statistical routines. URL:http://tech.knime.org/knime Availability: Free.
Windows, Mac OS X, Linux. GNU GPL.
R is an integrated suite of software facilities for data manipulation, calculation and graphical display. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. URL:http://www.r-project.org Availability: Free.
Windows only. Unknown License.. Epi Info™, a suite of lightweight software tools, delivers core ad-hoc epidemiologic functionality without the complexity or expense of large, enterprise applications. URL:http://wwwn.cdc.gov/epiinfo/ Availability: Free from CDC. Open source community edition available as well.
Windows, Virtual Machine (for Linux and Mac OS X). Commercial. SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and more advanced options through the SAS programming language. URL:http://www.sas.com Availability: Available on all Windows-based student computers. A basic version (University Edition) is available for free, but requires registration and a virtual machine, such as Virtual Box. USM students/staff/faculty may purchase discounted license through IT.
Windows, Linux, Mac OS X. Commercial. A statistical analysis package used throughout the social sciences. URL:http://www.ibm.com/software/analytics/spss/ Availability: Installed on all Windows-based student computers. USM students/staff/faculty may purchase discounted license through IT. A Free Software clone, called PSPP, is available for Linux and WIndows.
Windows. Commercial. Minitab features a complete set of statistical tools, including Descriptive Statistics, Hypothesis Tests, Confidence Intervals and Normality Tests. As well as advanced statistics tools, including regression and ANOVA. Minitab also allows for the discovery of the settings that optimize research processes using Factorial, Response Surface, Mixture and Taguchi experimental design methodologies. URL:https://www.minitab.com/en-us/products/minitab/ Availability: Available on Windows-based student computers. USM students/staff/faculty may purchase discounted license through IT..
Windows. Commercial. An advanced statistics package, comparable to Minitab. Provides an intuitive interface to complex statistical methods. URL:http://www.systat.com/ Availability: Version 12 is free to all UMS students/staff/faculty.
Windows. Commercial. The de-facto standard commercial GIS. Provides advanced geographic data manipulation, analysis, and modeling tools; as well as map production facilities. URL:http://www.arcgis.com/ Availability: Available on all Windows-based student computers. USM students may acquire a one-year free license from the GIS Lab..
Windows, Mac OS X, Linux. GNU GPL. GeoDa is a software tool for exploratory spatial data analysis (ESDA). It is intended to provide a user friendly and graphical interface to methods of descriptive spatial data analysis, such as autocorrelation statistics and indicators of spatial outliers. URL:https://geodacenter.asu.edu/software Availability: Free.
Windows, Linux, Mac OS X. GNU GPL. Open Source desktop GIS. Provides tools for spatial data manipulation, and analysis as well as modeling tools.Map production facilities are basic. URL:http://www.qgis.org Availability: Free.
Windows, Mac OS X, Linux. Free. Google's well-known model earth / digital globe. It can be used as a spatial data creation and presentation tool. URL:http://www.google.com/earth/ Availability: Pro Version is available on all student computers. Also available for download..
Exploring Big Historical Data by Shawn Graham; Ian Milligan; Scott WeingartThe Digital Humanities have arrived at a moment when digital Big Data is becoming more readily available, opening exciting new avenues of inquiry but also new challenges. This pioneering book describes and demonstrates the ways these data can be explored to construct cultural heritage knowledge, for research and in teaching and learning. It helps humanities scholars to grasp Big Data in order to do their work, whether that means understanding the underlying algorithms at work in search engines, or designing and using their own tools to process large amounts of information.Demonstrating what digital tools have to offer and also what 'digital' does to how we understand the past, the authors introduce the many different tools and developing approaches in Big Data for historical and humanistic scholarship, show how to use them, what to be wary of, and discuss the kinds of questions and new perspectives this new macroscopic perspective opens up. Authored 'live' online with ongoing feedback from the wider digital history community, Exploring Big Historical Data breaks new ground and sets the direction for the conversation into the future. It represents the current state-of-the-art thinking in the field and exemplifies the way that digital work can enhance public engagement in the humanities.Exploring Big Historical Data should be the go-to resource for undergraduate and graduate students confronted by a vast corpus of data, and researchers encountering these methods for the first time. It will also offer a helping hand to the interested individual seeking to make sense of genealogical data or digitized newspapers, and even the local historical society who are trying to see the value in digitizing their holdings.The companion website to Exploring Big Historical Data can be found at http: //www.themacroscope.org/. On this site you will find code, a discussion forum, essays, and datafiles that accompany this book.