To state the obvious there is a plethora of great information on Data.gov. For some it can seem a bit overwhelming to get started working with the datasets that are available to anyone. Here are three areas that you can start with and hopefully build from as you start to explore the vast amount of data.
To start you should understand what data.gov is trying to do and why its there in the first place:
The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. As a priority Open Government Initiative for President Obama’s administration, Data.gov increases the ability of the public to easily find, download, and use datasets that are generated and held by the Federal Government. Data.gov provides descriptions of the Federal datasets (metadata), information about how to access the datasets, and tools that leverage government datasets. The data catalogs will continue to grow as datasets are added. Federal, Executive Branch data are included in the first version of Data.gov.
If you have a little time it is recommended to browse the data.gov FAQ’s as many basic questions can be answered there.
Now that you have the basics to understanding what Data.gov is about. You should check out these three main areas:
Data.gov Wiki (http://data-gov.tw.rpi.edu/wiki)
The Data-gov Wiki was initiated by Jim Hendler with Li Ding and is co-led by Deborah McGuinness. The wiki contains:
- Data.gov Catalog – List of datasets from data.gov
- Complete – All Data.gov datasets
- OGD Only – Open Data Directive Data.gov datasets only
- Demos – live demos using linked gov data
- Tutorials – tutorials for open gov developers
- Issues – Issues in gov data
- RSS – recently updated gov data.
- Semantic Search – RDFa-based search engine for our wiki. data-gov-wiki
- Source Code – our code is available to all.
Data.gov Tutorials (http://data-gov.tw.rpi.edu/wiki/Tutorials)
Data.gov Tutorials contain:
- Data-gov Insights
- Other Resources
Data.gov Complete Data Catalog (http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog_-_Complete)
There are 4316 datasets published at data.gov (see listing ). There are 4444 data files mentioned in the datasets (see listing). Starting from data.gov datasets in csv/txt format, we have generated the following 417 RDF datasets (covering the content of 410 data.gov datasets) contributing 6418927209 triples. Note: we skipped some datasets each of which is a subset of a Category:Converted Dataset. and this list does not include datasets from Geodata Catalog.
Joshua is working to become a Data Scientist with focus on Analytics, Big Data, Machine Learning, and Statistics. His passion for Data and Information are second to none. He is a certified IBM Cognos Expert with more than 10 years experience in Business Intelligence & Data Warehousing, Analtyics, IT Management, Software Engineering and Supply Chain Performance Management with Fortune 500 companies. He has specializations in Analytics, Mobile Reporting, Performance Management, and Business Analysis.
- 2,092 feed subscribers
Tags2008 Analysis Analytics Article Big Data Book Business Intelligence Charts Cognos Dashboards Data Data Warehouse Design Dimensional Flow Elements Fusion Tables Google Humor IBM Install Learning Logical Market Microsoft Model Modeling Operational Predictive Programming Python Ralph Kimball Reporting Science Server SQL SSIS Statistics TED Tools Tutorial Unstructured Video Visualization Warehousing Windows