Chapter 3. MethodThe method of this research must provide a way to gather empirical data which either supports or does not support the hypothesis, which states that new communication technologies have made it possible for nonpublics and the general public to have significant consequences for an organization. In my opinion, such data can be obtained by examining a case study dealing with an organizational communication crisis, during which an organization is being associated with negative information and/or opinion which has caused, or may cause, significant negative consequences for this organization. The data required to support or not support the research hypothesis must show who was responsible for initiating, as well as continuously supporting the negative information which caused the crisis. The key data are the identities of the individuals or groups responsible for this communication activity. If those responsible can be aligned with a specific public, identifiable by an existing method of segmenting the general public into individual strategic publics, the hypothesis will not be supported, for nonpublics and the general public had no significant effect on the situation. However, if none, or most of the individuals responsible for the crisis cannot be aligned with a specific public, the hypothesis will be supported, for such data will demonstrate that individuals representing nonpublics and the general public have caused significant negative consequences for an organization. Justification of the MethodTo collect the data required to support the research hypothesis, I have chosen to select blog posts as the main source of data with mainstream media articles as additional evidence of negative publicity, and the Edelman/Wal-Mart crisis created by the “Wal-Marting Across America” fake blog campaign as the subject matter. News articles were collected by manually searching the Lexis-Nexis database. Blog posts were automatically collected from the Internet by a computer program written specifically for this research. Selecting Blog Posts as the Main Source of DataA blog is a website, usually frequently updated, with its content arranged chronologically. However, what makes blogs unique and particularly suitable for this research, is the link structure which connects them to each other. Tremayne (2007) explains: The social ties [of a blog author (blogger)] are explicitly designated when a blogger provides a link to another blog…. Bloggers link to other blogs within their posts, typically to respond to a point another has made or to direct readers to an item the blogger found interesting or useful…. Collectively, these links and the blogs connected by them comprise the blogosphere. (p. xi) Links can appear in the text of the post, in the comments section, in the trackback and pingback sections (both are mechanisms for facilitating and improving cross-blog discussions), and the blogroll section – “a list of links provided by a blogger to inform readers of work he or she considers useful or of high quality” (Tremayne, 2007, p. xi). Through these links, as well as through their multiple feedback/discussion features, “blogs have the capacity to be virtual communities” (Sundar and Edwards, 2007) or, according to Johnson (2007), have become “the closest thing to a genuinely self-organizing community that the web has yet produced (as cited in Rutigliano, 2007, p 229). In regards to the readership of blogs, a recent Pew Internet (2005a) study revealed data which clearly demonstrates the blogosphere overall popularity, as well as its growth: By the end of 2004 blogs had established themselves as a key part of online culture. Two surveys by the Pew Internet & American Life Project in November established new contours for the blogosphere and its popularity: Figure 1 demonstrated a summary of some of these findings.
Figure 1. Blogosphere Growth. Even though the study also showed that blogs have not yet gained the recognition of mainstream media – “blog-reading audience is about 20% of the size of the newspaper-reading population” and “only 38% of all internet users know what a blog is,” (Pew Internet, 2005b) it is possible to conclude that the blogosphere is both adequate and appropriate as the main source of data for this research. The link structure of the blogosphere makes it particularly suitable for this research. Besides, posts and comments are permanently stored together with the date and, often, the exact time of their creation. This makes the blogosphere unique as a source of data for research. Traditional content analysis could be used as a descriptive tool only. Pavlik (1987) observed that it “does not allow the researcher to draw conclusions about cause-and-effect relationships. Sending a message does not guarantee its being received or processed in the desired fashion.” (p. 40) However, with the blogosphere, one can exploit blog archives and the blogosphere link structure to chronologically trace the spreading of a particular story or theme, as well as the development of a discussion around it. Besides, one can derive, in a similar way, the cause-and-effect relationships between the discussion in the blogosphere and articles in mainstream media, which suggests a conclusion that one medium – the blogosphere or the mainstream media – may be the cause of publicity in the other. Therefore, when the blogosphere generates negative publicity around some issue, after which the story is picked up by mainstream media – provided that negative publicity in mainstream media can be, without any doubt, considered as negative consequences – it is reasonable to conclude that bloggers caused negative consequences for the organization involved. Selecting the Edelman/Wal-Mart Crisis as the SubjectThe chosen subject of this research is that The Edelman/Wal-Mart crisis caused by the “Wal-Marting Across America” blog: It all started [on September 27, 2006], when a folksy blog called Wal-Marting Across America was set up. The site featured the musings of a couple known only as Jim and Laura as they drove cross country in an RV, and included regular interviews with Wal-Mart workers, who were dependably happy about the company and their working conditions. (Gogoi, 2006a) The first suspicions were expressed on October 3 on The Writing On the Wal – an activist blog dedicated to Wal-Mart issues – in the form of an open letter to Jim and Laura challenging them to reveal their true identities and financial sources. (Rees, 2006) On October 9, Business Week (businessweek.com) published an article revealing the truth behind the blog: The story shot down speculation that Jim and Laura weren't real people, identifying the woman as Laura St. Claire, a freelance writer and an employee at the U.S. Treasury department. But it also disclosed that Wal-Mart was paying plenty for the couple's support, including money for renting the RV, gas, and fees for writing the blog. (Gogoi, 2006a) What followed was an outrage among bloggers: “once bloggers heard that Jim and Laura had undisclosed benefactors, they were furious,” (Gogoi, 2006a) which fueled a conversation on multiple blogs which lasted for more than a month. The discussion mainly focused on Edelman – the public relations agency responsible for the creating the blog – questioning the ethical choices and communication skills of the agency’s top management – especially after a week-long silence from Edelman and an apology finally posted by the agency’s CEO, which, arguably, caused even more controversy than the original story. As a result, Wal-Mart made a statement that the company had nothing to do with the campaign, the blog was removed (a screenshot of the blog web page is presented in Appendix A), and Richard Edelman, the agency’s CEO, posted an apology on his own blog, which ignited further debate and criticism from other bloggers. Thus, in this situation, the blogosphere, arguably, created a public relations disaster – i.e., significant negative consequences – for one of the world’s largest public relations agencies. The question is, which I will try to answer in this research, who, what publics were responsible for initiating and maintaining the online conversation which led to these negative consequences. Answering this question will support or not support the main hypothesis of this thesis. Selecting the Sources of Relevant DataThe first step to collecting the relevant data is to determine the criteria for relevance. There is plenty of information on the Internet about Edelman and even more on Wal-Mart. To retrieve information relevant to this study, after trying different combination of search terms, the following search query was selected: “edelman blog wal-marting or walmarting,” which proved to be optimal compared to alternative combinations. The Lexis-Nexis database was selected as the source for mainstream media articles. Selecting a source for blog posts proved to be more complicated. The traditional search approach using the Lexis-Nexis database returned only four posts and was discarded as inadequate. The next obvious choice was manually searching the Internet. Using a general purpose search engine proved inefficient: searching Google returned more than 12.100 results, Yahoo returned more than 2.100 results. MSN returned more than 9.300 results. The main problem with these results was relevance: even though each result, most likely, contained all the search terms, most of them were not blog posts – which made them not relevant for this research. A better source for blog posts is a blog search engine. After examining the two most popular blog search services – Technorati (technorati.com) and Google Blog Search (blogsearch.google.com), I selected Google. While the technical differences between these two search engines are beyond the scope of this research, Google was selected for the following reasons:
Computer Science Approach to Collecting DataCollecting relevant news articles from mainstream media proved to be trivial: a total of 18 articles were manually retrieved using the Lexis-Nexis database. However, even after selecting an appropriate source, collecting relevant blog posts introduced significant complexity. The main idea behind selecting the blogosphere as a source for data was to explore the “conversation” – i.e., determine how, when and by whom it was started, how it developed, how long it lasted and how many participants it attracted. Examining the results returned from Google manually would only provide information about how many participants the conversation attracted and when each post was created. A larger problem was that each post contained links to other web pages, some of which were blogs, which might contain relevant posts. Each of these potentially relevant posts could have more links to more relevant sources. Discovering this network of interlinked relevant posts would provide the necessary framework for examining the conversation. However, due to the number of links to potentially relevant blog posts, it is impossible to construct such a network manually within a reasonable period of time. To solve this problem, I chose to use a computer science approach and wrote a program which automatically discovered all the relevant blog posts, linked directly or indirectly to the initial set of results obtained from the Google blog search engine. Collecting the DataThe first step in collecting the data was running the initial search for the “edelman blog wal-marting or walmarting” search terms using the Google blog search tool, which returned 435 results. After removing duplicate results, links to irrelevant pages (such as lists of links or blog directories), pages requiring signing-in, inaccessible pages, nonexistent pages and pages in other languages, there were 108 results left. This set of URLs (URL stands for unique resource locator, which is the Internet address of a web page) was used as input to the program which proceeded to automatically discover all related posts. Description of the ProgramThe program can be described as a focused crawler – a computer science term describing a program which explores the hyperlink structure of the World Wide Web focusing on a specific type of web pages. The program’s most basic functionality can be described as connecting to a remote web server and retrieving the contents of a web page based on a provided URL – just like a human views a web page through an Internet browser like Mozilla Firefox or Internet Explorer. The content of a web page is retrieved as a string of text, which can be stored in a file or a database, viewed with the help of a text editor (or a word processor), and searched for terms, phrases or patterns. The program executes by continuously retrieving web page content based on a collection of URLs, analyzing that content to determine the retrieved page relevance, and processing relevant pages by extracting all the links contained in the page and adding those links to the set of URLs waiting to be processed. As a result, the program (also referred to as the crawler, or the spider) will keep exploring the link structure, adding relevant pages, while it has URLs to process. If the criteria for relevance are tight enough, the program will eventually terminate because there will be no more URLs to process. However, if the criteria for relevance are not tight enough, the set of URLs to explore will grow very fast beyond manageable size. The current program implementation proved to be sound and the program executed and terminated normally. Figure 2 provides a more precise description of the program’s algorithm: - Add 108 results from the initial Google search to URL queue - Repeat while URL queue is not empty: - Remove next URL from queue - Add the removed URL to the set of processed URLs - Retrieve web page by URL - If the retrieved web page contains the search terms: - Add the URL and the web page content to the set of relevant pages - Retrieve all relevant links from web page - For each retrieved link: - Add the link to the set of retrieved links - If the link points to a URL which is not in the set of retrieved links, Is not in the URL queue, and is not in the set of processed URLs: - Add URL to queue Figure 2. URL Collection Algorithm. The relevance of a link was determined by two criteria:
The program was written in the C# language for the .Net platform and was executed on a desktop computer running Windows XP. The program terminated in 4.5 hours, exploring a total of 11.935 web pages. The program discovered 477 relevant pages and 36.694 links. Description of the Final Data SetEach of the 477 pages discovered by the program was manually examined. The first step was to eliminate irrelevant pages, such as duplicates (same posts appearing on other blogs or in different sections of the same blog), lists of links or blog directories, pages requiring signing-in, inaccessible pages, nonexistent pages, pages in other languages, as well as posts without a date, for they would be of little use for this particular research. The remaining 201 web pages were manually processed; the title, main body, author, posted date and time, and the number of comments for each page were stored in a relational database. A few of the web pages in the final set turned out to be mainstream media articles – such as stories posted on cnn.com, businessweek.com and a few other online publications. These pages were not excluded for two reasons: first, in some cases it is hard to determine whether a post belongs to a blog, or a mainstream media publication; secondly, and most importantly, most of them have comments and are linked to from blogs, which makes them part of the conversation going on in the blogosphere. Therefore, these pages will be treated and referred to as blog posts. There were five blog posts which were posted on two separate blogs. These cases were treated as separate posts, since each received different comments and was lined to from different blogs. The 11.935 links were automatically processed through running multiple scripts which normalized the URLs, bringing them to a common standard, discovering duplicate links and removing links to pages not part of the finalized set of blog posts. The remaining set of 774 links connecting only the final blog posts were also stored in a database. It is important to note, that, since the textual part of each link, as well as the total number of links connecting any two blogs were irrelevant, only distinct links were stored: i.e., if blog A linked to blog B through N links, only 1 link, or connection, was stored. Therefore, the final data set consisted of three database tables, one containing blog post data (such as title, body, author, date and time of the post, and the number of comments), another one – summaries data for each blog, and the third – linkages between blog posts. This kind of data model offers the possibility of discovering various quantitative data by running selection queries. Data analysisThe data analysis will consist of several parts. First, I will run several selection queries on the data to reveal statistical data, such as the number of blogs each blog is linked to (counting only relevant links – i.e., links to blog posts in the data collection), the number of blogs that link to each blog, etc. The results of these queries will help me identify the blog posts which had the most effect on the discussion according to several criteria. I will proceed with a quantitative analysis of the blog posts to reveal the timeline of the conversation (i.e., how it developed chronologically). The next step will be a qualitative analysis of the content of all blogs (as well as their comments sections) in order to classify each blog as positive, negative, neutral or balanced. I will then perform the same qualitative analysis on the mainstream media articles obtained from Lexis-Nexis. The final step of the data analysis will be to select a set of the most influential blogs in regards to this particular conversation based on the obtained results from all previous analysis, identify the authors of these blogs and determine whether they can be considered to be members of any key public, defined with the stakeholder or situational approach to segmenting the general public. Quantitative Analysis of Blog PostsQuantitative analysis of the blog posts data will be based on the following reports:
The values of N, M and K for the last report will be determined based on the amount of relevant data and will define the set of blogs to be analyzed qualitatively. Qualitative Analysis of Blog Posts and Mainstream Media ArticlesThe purpose of the qualitative analysis of the most influential blog posts will be to determine whether the posts and the comments made on these posts were overall negative, positive, balanced, or neutral. Based on the results, I will estimate what percentage of the conversation (calculated based on the total number of both, posts and comments) was negative. I will also discuss the extent to which the posts were negative: while this analysis will not offer any concrete quantifiable results, it will help illustrate the specifics of the negative consequences caused to Edelman by the discussion on the blogosphere. The news articles will be analyzed in a similar way, to determine the negative publicity related to the case. A list of negative news articles and negative blog posts sorted by date, may also reveal the cause-and-effect relationship between the blogosphere and mainstream media, provided one mentions the other as a source. Assigning Blog Authors to PublicsThe final step of the data analysis will be obtaining the identities of the authors of the most influential blogs (determined in the previous steps) and attempting to map these authors to specific publics, defined by either the stakeholder or the situational approach. The source data for this step will vary. In some cases, a blog description (usually, the “about” web page) will be sufficient to determine that the blog represents an activist group. In other cases, a blogger’s biography (often available as a separate web page) will provide clues to his or her professional affiliation, which will help align the author with a specific public. If there is no data about the author’s identity, I will analyze other posts on that blog by looking at the list of categories available on most blogs, to determine the author’s main interests and the main topics covered on the blog. This data will help speculate on the author’s possible alignment with a particular public. The process of mapping a blog author to a public will be unique in every case, so each mapping will be discussed separately. I will attempt to align each author to a known type of public (such as employees, media, activist group, etc.); if such an alignment is not possible, I will assign to author to the “nonpublic” category. The general public consists of both, publics and nonpublics (i.e., everyone). Thus, if all or most of the authors of the most influential blogs are assigned to nonpublics, I will conclude that the general public caused significant negative consequences for the organization involved in the crisis. However, if that is not the case, my hypothesis will not be supported. Limitations of the MethodThe main limitation of the methodology is that it does analyze the entire collection of blog posts related to the case study. Gathering additional relevant posts might be possible by expanding the set of search terms used to determine web page relevance by more general terms, such as wal-mart and not requiring that all terms appear on a page. For example, a blog post about the wal-mart fake blog not mentioning the name of the blog or the name of the public relations agency responsible for the blog, would be still relevant, yet it was not captured by the selected data collection method. Another limitation is that the comments on each post as well as the trackback and pingback sections were not processed in detail: only the total number of comments was recorded, whereas the trackback and pingback sections were only used as a source of outbound links – just like the rest of the web page’s content. Recording each comment’s author, posting date and time, as well as the author’s blog’s location (when applicable) would provide a far more detailed network of posts and comments, which would offer possibilities for a much more detailed and precise analysis of the network and the conversation. However, both of the described limitations do not affect the data significantly. The constructed network, which represents a chronological model of the conversation, contains the most relevant blog posts, thus being an adequate representation of the data. |