Navigating Politics: Data Formatting


Data Formatting:

Once you have found the data that you are looking for, downloading from a compatible source for the data formatting and analysis being performed can be challenging. This will save you a lot of trouble. Yes PDFs are convenient, but a raw file format like .xls (MS excel) or .csv (comma separated values) will save you a lot of trouble when trying to sort statistics and generating presentations during the data formatting process.

The Shapefile (.shp) format for cartographic design are great sources for already formatted datasets. Each shapefile consists of five base files, one of which is a database (.dbf) file. You can download any shapefile, open it up, to reveal a list of files. The .dbf will run in any database application like Microsoft Excel or Access. Having a .dbf file makes the method by which you complete your data formatting and how long you need to spend arranging cells into columns and rows go quicker.

Probably the most tedious task once you have the information is data formatting the records of information for analytic use. If the records in your data table are circularly referenced, a few hours of reconstituting the original method of organization is in order. Each attribute of every data record should be in its own attribute column. For example, sorting a column in a spreadsheet that contains both a candidate’s name and party affiliation can lead to troubles; there should be an attribute column each for [first_name], [last_name], and another for [party].

For larger datasets, learning a little bit of query and selection language (SQL, Visual Basic, Python) would be beneficial. Hiring a data formatting professional temporarily that already knows MS sql is well worth it for reformatting or creating subsets of information from the original larger dataset.

One last thing on this note:

Often you have multiple sets of data and need them all together. The best way to merge datasets is based on a common field using a databasing tool application like Microsoft SQL Studio. Each set of data should have a common attribute like “name” that you can use to append one set to the other.

Next to come: Visualizing Data