A Reporter’s Guide to Excel
For education journalists with basic and intermediate Excel skills
Published April 2016
Reporters use Microsoft Excel to analyze data to look for trends, anomalies and story ideas. Databases are full of information broken down into rows and columns. This guide will teach skills that are needed to clean and analyze databases to extract information for story use. Remember, there are stories in the data.
Use this guide as an introduction to the Excel concepts you’ll find most helpful in your reporting, and remember to be patient. Many Excel tricks require some practice, but you’ll see the payoff quickly. And if you make a mistake, just hit “control+z” to undo a step.
Accompanying this guide is a set of spreadsheets that you can use to practice the skills you’re about to learn. The examples in this guide mirror the examples in the spreadsheets.
Get the Data
The first step is to get the data. State laws vary, but a basic request should detail specifically what is needed and the time frame sought. Also, always request the information in electronic format, such as Excel. And make sure to request the record layout, which will define the information in each column. Some column headers consist of abbreviations, and you don’t want to guess them.
Basic Excel Organizing Tools
Excel is made up of columns and rows. Columns have letters and are vertical. Rows have numbers and are horizontal. Consider this the latitude and longitude of data mining.
Excel allows you to manipulate data in time-saving ways, but it helps to know the basic editing and organizing functions to quicken the pace of your analysis. In some cases, applying a simple edit to your cells can eliminate the need for more involved formulas. In other cases, you’ll want your data organized in certain ways before advancing to the formulas we’ll explore in this guide.
Sort: This is the fastest way to rank data in a column. The sort function works for text and numbers. At its most basic, the Sort option allows you to organize rows of data in one or more columns numerically or alphabetically. To do a quick sort in Excel, right click anywhere in the column you want to sort. [See example ‘SORT’ below]
To sort by more than one level, do a customized sort. Under the home tab, click on Sort & Filter and then Custom Sort.
The Filter is sorting on steroids. In addition to organizing rows in one or more columns by their alphabetical or numerical order, this function allows you to limit the data on display for more targeted analysis. For example, if you have a district employee spreadsheet and one of the columns identifies the employee type (principal, teacher, librarian, etc.), you’ll be able to use the Filter function to select as many or as few types of employees as your analysis requires. To turn on the filter, click on the Home tab and go to Sort & Filter. Click on filter, and you will see the pull down tabs on the column headers. For more in-depth filters, click on the filter tab and go to “date filters” or “text filters.”
Format Cells Using ‘Home’ Tab Features
To format a cell or column, right click on the cell or highlight the column. Go to Format Cells. Under the Number tab, you can specify the format of the cell as general, text, a number, a date, currency, etc., or customize it. The Alignment tab allows you to align text, wrap text, merge cells, etc. You can add a border to your spreadsheet under the Border tab. You’ll want to format your cells regularly as some formulas call for text, while others call for numbers. Knowing what the formulas call for is part of the fun of Excel!
To freeze the header row, got to the View tab and click on Freeze Panes. (Mac users follow these steps.) Three options allow you to freeze the top row or the first column, or to designate a combination of top row/column to freeze. As you become more acquainted with Excel, this function will likely stand out as one of the most useful. In spreadsheets with hundreds of rows and dozens of columns, it’s hard to keep track of what each row or column represents — unless, of course, the header (name) of each row or column stays in place as you scroll through your spreadsheet. Notice in the example below that the top row is numbered “1” but the following rows start at number “1583.” As you scroll through the data, the “freeze pane” (in this case, the top row) stays in place so that you don’t lose track of the headers.
This function allows you to flip columns into rows and vice versa in Excel. Here is the easiest way to transpose information: Highlight and copy the cells you want transposed. Once copied, go to the cell where you want your transposed information to appear. Right click on the empty cell and click on Paste Special in the menu. Next, check the transpose box near the bottom and hit OK.
Add/Delete rows and columns
To add rows and columns, highlight the area where you want the new rows or columns. Right click, and click Insert. If you want to add five rows/columns, then highlight five rows/columns and then click Insert. To delete rows/columns, highlight the rows/columns to delete. Then, right click and hit Delete.
Adjust column width
There are a couple of ways to adjust the column width. The easiest way is to left click on the line and drag it to where you want it. You can also highlight the row or column, right click, and then click Column Width. Enter the desired column width and hit OK.
A good starting tip with functions is knowing what formula was used to arrive at an answer. If you see a number in a cell, click on it to see if that number was derived from a mathematical function, like the ones you’ll learn about in this section. This is also good to keep in mind as you gather more information through formulas and want to double-check how you arrived at those figures or calculations.
To add two cells, select the cell in which you want the total to appear and type: =b2+b3.
To add a column of numbers use: =sum(b2:b10) [See Example 1]
You can either type out the range “b2:b10” or move your cursor to select the part of the column you wish to apply to your function, in this case the ‘sum’ function. Practice dragging your cursor along the cells you want to add; you’ll notice that Excel applies a light color to the cells you’re pulling into your equation. This trick can be applied to nearly all functions.
=b3-b2 (Which in Example 1 above would result in 432-267)
The middle number in a group of numbers. For example, the median of 2, 3, 3, 5, 7 and 10 is 4. Use the median instead of average when numbers in a set differ greatly. For example, finding the median salary for the Dallas Mavericks players would be more useful than finding the average salary, as the star player makes millions more than all other players and his salary would skew the data. Using Example 1, here’s how you find the median: =median(b2:b10)
The most frequently occurring number in a group of numbers. For example, the mode of 2, 3, 3, 5, 7, and 10 is 3. Mode can be used to highlight important information in data, such as pointing out that the majority of teachers make $47,000 a year. Using Example 1, here’s how you find the mode: =mode(b2:b10)
The formula to find the average is =average(b2:b10)
To find the maximum number in Example 2, the formula is =max(a2:a23) The max number is 22.
To find the minimum number, the formula is =min(a2:a23)
Counts the number of cells that meet the criteria that you specify. In Example 2, find the number of people over age 20 with this formula (note the quotation marks): =countif(a2:a23, “>20”). The answer is 2. The countif formula also can be used for text, such as the number of times a name appears in a column.
Calculating percentages in Excel. Example 3 shows how to find the percentage that each county in Texas constitutes of the state. The formula is =b242/b257 with b257 being the total that you’d use as your denominator. Notice that in the example you first needed to add all the values in the range in column B, where it says “total.”
Instead of writing the percentage formula for each cell, use the copy function by dragging or double-clicking. But you will get error messages in those cells unless you anchor the denominator, or hold it constant. Do this in Example 3 by adding a dollar sign to the formula: =b242/b$257
Allows you to make apples-to-apples comparisons for entities that may not be of equal size. Example 4 shows the number of property crimes per 100 people at select universities: =(H2/B2)*100. If you wanted to represent the data per 1,000 people, you’d do this: 54,942/1,000=X; 378/X.
Finding the percentage differences between two numbers. In Example 5, use =(b2-c2)/c2. An easy way to remember this formula is to think of “NO” with an extra “O” =(New number-Old number)/Old number. In Example 5, using the formula, we see the number of forcible rapes declined by about 10.8 percent from 2004 to 2008. Why decline? Because the formula spits out a negative value (-.1080118694). Another way of remembering percent changes is by relying on fractions you know well, like those with the numbers 3 and 4. Went from 3 to 4? You grew by a third [(4-3)/3]. Dropped from 4 to 3? You declined by a quarter [(4-3)/4], which is another way of saying you grew by [(3-4)/4], or negative ¼.
The “If” function allows you to determine if a condition is true or false according to your own parameters. Like many other formulas, this one is executed in a new column. For example, if your story calls for determining the proficiency levels of various schools, and you have the raw scores but also know the cut-off points for each achievement level, you can create an If statement that inserts a new column whether the achievement level was at-or-above proficient or not. The following example uses a more simple If statement =if(a2<b2,“Too much”, “OK”) [See example 11]
The sumif formula allows you to sum information fitting a particular condition. The following formula will sum the number of property crimes committed in cities with a population over 15,000: =sumif(b2:b8,“>15000”,c2:c8) [See Example 12]
The formula breaks down like this:
b2:b8 is the population of each town
“>15000” seeks information for cities with populations over 15,000 (notice the quotes)
c2:b8 is the range for the property crime data
This is a supercharged Sum If that allows the user to find data with added precision. The formula is somewhat different than Sum If, but the logic is similar. Say a story calls for property crime for universities in cities of a certain size, organized by whether the school is private or public. By adding that new variable — public or private — a new formula is in order. Here’s what to do to find the number of property crimes of public universities in towns that have a population of more than 15,000: =sumifs(d2:d8,c2:c8,”>15,000″,b2:b8,”Public”) [See Example 13]
The formula breaks down like this:
d2:d8 is the range for the property crime data
c2:c8 is the population of each town
“>15000” seeks information for cities with populations over 15,000 (notice the quotes)
b2:b8 is the range that identifies whether the school is public or private
“Public” seeks information only for public schools
Allows you to search for specific information in your spreadsheet. It’s useful for locating information that’s hard to pinpoint in a spreadsheet with thousands of cells. It’s also ideal for joining information that’s not easily accessible, like pairing teacher employee ID numbers in one tab of a sheet with the teacher names in another tab of the sheet. The key thing to consider with vlookup is whether the different columns you want to pair have a common element (name, ID, birthday, etc.). Vlookup can save you hours of work, especially with files that contain hundreds or thousands of rows.
In almost all cases, vlookup requires the user to create a new column to list the results of the formula. Another way of framing vlookup’s function is that it is a search command that lists results according to your specifications.
In Example 6, to find the name of an employee using an employee ID in the same sheet, use the formula =vlookup(e2,a1:c21354,2,false).
Here’s how the formula breaks down: E2 is where you will enter the information that links to the data you want returned; you would enter employee ID in this case. A1:c21354 is the range of information (notice in the formula that this document has at least 21,354 rows, but in the accompanying spreadsheet we only have 12 rows to save space); 2 is the column of the information to be returned – full name in this case; false indicates you want an exact match. (True will return the closest match.) Something to stress: Whereas most commands in Excel ask you to name the column by its given letter, vlookup’s third item asks the user to identify the column’s position relative to the first column. Let’s go back to the example above; we’re telling vlookup to look at “2” because we want information pulled from column B, which happens to be the second column from the left. If the column from which you needed information happened to be column F, you’d enter “6” because F is the sixth column from the left (and the sixth letter in the alphabet).
Type an Employee ID in cell E2. The name of the corresponding individual will appear in cell D2. [See Example 7]
Vlookup (looking at test data across two worksheets):
Sheet 1 in Example 8 shows student names and points that they received. Sheet 2, called Grades, provides the letter grade based on how many points were received.
Let’s do a Vlookup using Sheet 1 and the Grades sheet to determine what letter grades the students received. Here’s the formula (just note that if you follow along using the provided spreadsheets, “Grades” will be called “ ‘Example 8(sheet2)!’ ” (Notice the single quote marks around the sheet name and the exclamation mark.):
Here’s what it looks like with the formula:
Here’s how the formula breaks down: B2 holds the column with the points in Sheet 1. By putting grades! (or ‘Example 8(sheet2)’ in the accompanying spreadsheet) you tell Excel that you want to draw from information on the Grade worksheet.
A$2:b9 is the cell range on the Grade sheet. Note: The dollar sign $ is used to anchor the formula so it can be copied without producing errors. The number 2 at the end of the formula tells Excel the information you want returned in Column 2 (letter grade) from the Grade sheet (Example 8, sheet 2).
A key point: Almost always with vlookup you’re searching for an exact match, which is expressed using “false” or “0” as the last portion of the vlookup formula. In this instance, though, we didn’t type anything after the “2” for Column 2. Leaving that blank instructs Excel to make a partial match. You’d get the same result if you used this formula: =Vlookup(B2,Grades!A$2:B9,2, True).
Note, though, that had you entered the formula using “false,” you’d get no result, because the percentage ranges in the column with the letter grades don’t perfectly match the percentage scores in Sheet 1 that has the student names.
Here’s the end result for Sheet 1
You’ll be using this one a lot. There are many functions of Paste Special, one of which allows you to copy information in a cell without having to maintain the formula. For instance, if you use the concatenate formula to combine last and first names in one cell, you likely don’t want those cells showing as formulas. To get rid of the formula while maintaining the information, copy the cells you want to change. Once copied, right click and go to Paste Special. Click on “Values and number format.” Hit OK.
Text to Column
It allows you to divide a cell, such as splitting the first, last and middle name in a cell into three columns. Create a new column next to the row you want to split. Highlight the column you want to divide, go to the DATA ribbon, and click text to columns. Note: Google spreadsheets doesn’t have this feature, so practice along using your computer’s Excel software. The icon will be similar to this.
“Delimited” should already be selected in the window that comes up. Click Next. Pick what you want to be your delimiter. Do you want the split to occur on a tab, semicolon, comma, space or enter your own in other? (You’ll know how to answer this by looking at the data you want to spread across multiple cells; are the names or values separated by commas, spaces or something else?) The window at the bottom of the “Text to Column” display will give you a preview of what it will look like. Hit next, then finish. To select your own column breaks, use “fixed width” instead of delimited. If the cells you want to split can be organized across three columns, create two additional columns before you make your Text to Column changes. Sometimes you need to be strategic about your Text to Columns, so consider creating as many new columns as are necessary for the job.
Removes any spaces before or after your text. This practice is helpful when preparing your data for complex data crunching. Remember, most formulas need exact matches, and if you’re searching for “dog” in cells where there’s blank space before or after “dog,” you may receive an error. To enter this formula, first create a new column that will feature your trimmed data. Here’s the formula for Trim: =trim(cell range)
This function will separate out a specified number of characters, starting from the left of a cell, and display them in a new column. To enter this formula, first create a new column to display the selected characters. Then enter in parentheses after “=left” the name of the cell you want the formula to pull from and the number of characters you want to be shown in the new column:
=left(cell name, number of characters)
This function will separate out a specified number of characters, starting from the right of a cell, and display them in a new column. To enter this formula, first create a new column to display the selected characters. Then enter in parentheses after “=right” the name of the cell you want the formula to pull from and the number of characters you want to be shown in the new column:
=right(cell name, number of characters)
The Left and Right formulas are useful for when you want to do an analysis based on the first few or last few characters of a set of values. An example of usage: Say you have phone numbers including area codes, and wish to create a new column that contains the first five characters of those numbers. (Why five? Because an area code has three numbers and typically a set of parentheses.)
Will change any text to proper case (capitalizing the first letter of each word): =proper(cell name)
When working with a new database, make sure there are no duplicate rows. Under the Data tab, click on “Remove Duplicates.” Indicate which column(s) you want to search for duplicates. (Just be mindful of the data in the column: Removing duplicates in a long list of last names may remove unique individuals who coincidentally share identical last names.) Note: This function is not immediately available on Google Spreadsheets, so follow along using your computer’s Excel software.
This function allows you to replace words or numbers in one or more cells in your worksheet. For example, if you want to fill in all blanks with N/A, click on the Home tab and go to “Find & Select.” Next, click “Replace.” Under “Find what,” leave it blank. Under “Replace with,” enter N/A. You can choose to Replace All, Replace (does one at a time) or Find All (lists all applicable cells). If you want to limit Replace to a portion of your worksheet, just highlight the section you desire; otherwise, Excel will search the whole sheet. For Mac users, just use the Command+F keyboard shortcut.
Allows you to combine cells, such as to bring together a last name and first name in one cell. It’s another way of saying “to combine.” One could think of this command as the Text to Column command in reverse. Because you’re combining values from two or more cells, you’ll want to create a new column for your concatenated results. Example 9 shows the results of the formula =concatenate(a2,“ ”,b2)
Example 10 shows the separation of the first name and last name with a comma using this formula, though you can just as well go with first name then last name by reversing the order of the cell names in the formula:
Count days, months and years between two dates — ideal for getting a clear sense of the experience level of various teachers in the district you cover. You’ll first need to have a data set that includes a range of dates, say the start date and end date of particular contract, employment period or academic season. Once you have that clear number, you can proceed to more complex analysis, like comparing years of experience to value-added scores, for example.
For years, use the formula =datedif(oldest date,newest date,“y”) To count months, use a “m” in place of the “y”. To count days, use a “d”.
The formula to find years is used in Example 14: =datedif(c2,b2,”y”)
*Sometimes the formula won’t work unless the dates are organized as mm/dd/yy. To change the layout of your dates, highlight the cells you want to adjust, right click, scroll down to “format cells,” select “date” as the category on the left, then scroll through the options to find month-day-year.
The networkdays formula returns the number of work days excluding weekends and holidays. This is useful for any data mining that calls for finding the number of days a staffer worked to arrive at a per-day salary rate, for example. The formula is =networkdays(oldest date, newest date). Using Example 14, the formula starting at D2 would look like this: =networkdays(c2, b2)
This formula goes a step further than vlookup, which requires your lookup column to be to the left of the returned value. The index/match formula doesn’t have the same limitations. The chart below in Example 15 uses the index/match function to look up gifts using school name.
Here’s a breakdown of the formula: c2:c9 is the range returned; a14 is where you enter the lookup information; b2:b9 is the range of lookup information. 0 will return an exact match.
For a video on when Index/Match is preferred to Vlookup, check out this Business Insider tutorial.
Index/Match may also be used to merge data from different sources that contain a common reference point, like student or teacher ID numbers or budget codes. Once you become more adept at using vlookup, Index/Match may prove to be a more compelling option for your data-crunching needs.
Pivot tables allow for a quick breakdown of similar information that appears in multiple rows or columns. Unlike other Excel functions, pivot tables aren’t as reliant on formulas; instead, these tables allow you to drag and click rows and columns in ways that make it easier for you to spot trends or find the sum of some or all of your variables. Even better, the pivot table comes with an easy-to-use dashboard so you can better visualize the steps you want to take to analyze your data. Here’s a solid tutorial to get you started on using pivot tables.
To find the function on your screen, locate the “data” ribbon in Excel and find the Pivot Table button or tab. The actual location depends on which Excel version you use, so here are two websites, one for Macs, the other for PCs, that will help you find pivot table on your machine.