Wikipedia- A Big Data Perspective and SWOT analysis

Screen Shot 2016-11-25 at 12.11.34 AM.png

Authors: Ozan Kaya, Inti Mendoza Estrada, Rodrigo Saavedra Sotelo, & Ana Paz


The renowned website Wikipedia was first launched in 2001 and has been the source of information for many until this day. It is a website known for having a wide variety of articles that are written and can be edited by its users.  With over 19 million entries in over 282 languages, Wikipedia has already established itself a world heritage website. Due to its high importance as a research tool and source of information we believe it is important to analyze the effect of Wikipedia on Big Data.

Wikipedia and Big Data


Wikipedia’s societal effect makes it an important Big Data example. With a constantly increasing article database that also stores previous versions and user information, Wikipedia stands among the largest growing databases in the world. Wikipedia’s need of capacity to keep up with this information exceeds what is achievable by traditional database solutions and requires big data approaches. These challenges include the applications of editing, storage, website-wide validity control, querying, and more.

Wikipedia is an especial instance of big data, as Adams and Brückner describe in their article, it “shapes people’s frame of reference and because it is a window into construction of new bodies of knowledge”. Wikipedia is in constant evolution and growth. The access rate is also very high, with the English version alone getting over 8 million visits per hour. The current Wikipedia takes space up to 44 GB when revision records and the like are excluded, and up to multiple terabytes when everything is included.

Why Wikipedia is a strong big data business and it’s flaws- (strengths and weaknesses)

Wikipedia is completely free. Anyone that has an Internet connection will have access to millions of topics. Information distributed so freely, spanning across wide fields, and available worldwide is a new concept that came to life in recent years. In a manner of speaking, Wikipedia has become a source of information that can be accessed by anyone, including those that might have previously not had the chance to gather any sort of information.

Before the dawn of web wide encyclopedias printed tomes were utilized to keep up with information. It is easy to see that these printed encyclopedias could easily lose their validity over the course of a year and could not be updated unless a new copy was published, which simply leads to a waste of space and resources. Wikipedia, on the other hand, can be updated by the hour while also storing previous versions of a certain article. This makes Wikipedia an impressive source of data collection that is constantly updated.

Furthermore, Wikipedia is a great place for anyone to start their research. Most of the time, information will have a reference, which the visitor can check to obtain information right from the source. This ensures the trust necessary to conduct research, while allowing conveniency to find related publications, keywords, and background information. Wikipedia is also unusually resourceful in the areas of popular culture and science.

However, some of the most heard of scandals that Wikipedia has faced throughout its existence, and especially in the most recent years, are related to the display of evidently false or biased data. This is due mainly to the most important and characteristic feature of this virtual encyclopedia: the fact that it can be edited virtually by any user in the world wide web. This is evidently an open entry to people seeking to commit vandalism and the so-called ‘trolls’, but may also be abused in a more serious way, by the manipulation of polemic information to favor a certain faction. Since the edition regulation for a site with such huge amounts of information cannot be effectively or regularly implemented, it becomes a challenge to overcome this sort of misleading editions, that may occur in the form of inaccurate information, clean humor,  derogatory slurs, or even messages of hate.

Situations like these provoked the early infamy of this information platform. Initially, teachers and professors would prohibit its use to students in researching tasks. The creator of the platform, Jimmy Wales, reacted to his creation’s image by making his editing policy stricter in several ways. For instance, users must now register as editors prior to being able to modify an article, and such modifications do not become visible right away. Still, the premise of Wikipedia being made ‘for the people and by the people’ still generates controversy and mistrust amongst the users. Nevertheless, it somehow continues to be the most sought online platform for factual information.

Wikipedia has one big weakness that is reliability. When utilizing Wikipedia, scholars are reminded to not rely on the information provided by the website. Whilst it can be said that the Wikipedia editors have made an effort to have correct contributions to their pages, it has been reported that some of the contributors have faked their credentials. This indicates already that entries to the pages can have errors not only in information but also in being disorganized and even duplicated across Wikipedia.

It is not only just Wikipedia–(opportunities)

Wikipedia is part of The Wikimedia Foundation. This foundation involves projects such as Wiktionary, Wikiquote, Wikibooks, Wikidata, etc. For example Wiktionary, like its sister projects, is a database of information, but focused on the meaning and translations to other languages of words. Wikiquote, is a database of quotes said by famous people or found in intellectual material, as well as cultural sayings, whereas Wikibooks aims to become a repository of as many free e-books as possible with as big of a range, to further on make available as much information as possible for free, particularly for students and teachers.

Wikidata acts as a central repository for all its sister’s data but in a structural way, allowing humans along with machines to edit them freely as they would if they accessed any of its sister’s projects individually. Wikidata, however, is not confined to only storing data for its sister projects, but for projects outside The Wikimedia Foundation as well, furthermore increasing opportunities for The Wikimedia Foundation and all its projects.

Through the growing and expansion of its sisters and outside projects, Wikipedia benefits with publicity and even more information indirectly.

Let’s all just make a Wikipedia-like website. (threats)

Because of the expansive popularity of Wikipedia the threats to the webpage are limited to similar websites that have failed to achieve a level of popularity  similar to that of Wikipedia. Websites such as Citizendium compete in that they have a similar layout, however  failed to be able to integrate family friendly dynamics together with experts editing their informational pages.

A closer look at the layout of Citizendium reveals what to the public eye will be a rip off cheap version of Wikipedia, which can result in confusion by the user or disregard of this website.


Another competitor of Wikipedia is Scholarpedia, which seeks to have experts as sole editors of their articles. Under their information tab about the authors, however, they indicate that this website is a “peer-reviewed encyclopedia written by the leading experts of their respective fields” yet by reading just a little more on the same page one is given explicit instructions about how to easily join and write your own article. This is in fact already easily understood when looking at the logo.


The efforts by several sites like Wikipedia, including this one as well, of having a website that can unite information and make it available to the public seems to always have a problem with reliability and trustworthiness. Nevertheless, the popularity of Wikipedia has allowed it to triumph over the other websites leaving it to remain as the king of the hill to this day.


In the world of big data Wikipedia is an enormous step in the progression of information maintenance and delivery. It still has issues to overcome, however; mostly arising from its contradictory policies and purposes. Whilst highly controversial this website manages to bring together a large source of information and make it public to the public. It is then that the users of this resource have to learn to determine whether or not they want to rely on and trust the information they gather. We believe that as an initial tool for queries, Wikipedia is a great start, yet a proper research should not end here.

The following years will show what role Wikipedia will play in human history, whether as a vast encyclopedia or a demanding big data service with storage space running out.

References Retrieved 22-11-2016 Retrieved 22-11-2016 Retrieved 22-11-2016 Retrieved November 24th, 2016 Retrieved November 24th, 2016 Retrieved November 24th, 2016 Retrieved November 24th, 2016


One thought on “Wikipedia- A Big Data Perspective and SWOT analysis

  1. I think wherever there is user input we will have the discussion about data accuracy. Wikipedia faces similar problems to Facebook’s fake news scandal, or any news publication that starts news stories with “Researchers have found that…” . Even with studies, one paper is not enough to prove a fact, we need replication studies, verification etc. I am really curious to see how companies, especially Wikipedia and Facebook, will try to fix this in the future.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s