Web Archive Cooperative (WAC) at Harding University | Research | Home |
The Web Archive Cooperative (WAC) is a three-year NSF sponsored project (1008492) whose goal is to address the barriers of accessing and sharing disparate archives of web data (e.g., archived web pages, Twitter updates, user-generated tags, and newsgroup postings). WAC is a joint venture with Hector Garcia-Molina and Andreas Paepcke at Stanford University, Michael L. Nelson at Old Dominion University, and Frank McCown at Harding University. See the Stanford WAC website for work performed at Stanford University and the Web Science and Digital Libraries Research Group blog to keep up-to-date with work performed at ODU.
Dr. McCown taught a course entitled Introduction Web Science in Spring 2011 and again in Spring 2013. This was a survey course for junior and senior Computer Science majors which covered some of the fundamental concepts of Web Science: web architecture, web characterization and analysis, web archiving, Web 2.0, web search engines, analyses of social networks, collective intelligence, recommender systems, and clustering algorithms.
See the class web page for slides, homework problems, and project descriptions.
Frank McCown presented a talk entitled Teaching an Introduction to Web Science Course to CS Undergraduates at the Teaching the Web with Web Science workshop at the Web Science 2012 conference (June 21, 2012 in Evanston, IL).
Frank McCown and Michael Nelson presented a poster at SIGCSE 2014 entitled Resources for Teaching Web Science to Computer Science Undergraduates (March 7, 2014 in Atlanta, GA).
Harding comp sci majors are hired as researchers each summer to work on WAC projects. Below is a summary of the research performed the past three summers.
Vivens worked on a project that would automatically locate non-accessible music videos from YouTube.
For example, if someone posted a video to YouTube that was later taken down (because of copyright
violations or other reasons), Viven's project would help locate where the same or similar video
would be located in YouTube.
Date: Summer 2011 |
Richard performed some research examining the Mobile Web from a web archiving perspective.
He built a program that would find web pages designed for mobile devices and measured the similarity
with standard web pages. He also built a classifier to determine if a web page was designed
for a mobile device or desktop based on some web page features.
JCDL 2013 paper: First Steps in Archiving the Mobile Web: Automated Discovery of Mobile Websites Date: Summer 2012 |
Daniel continued the work performed by Vivens the year before. He has designed a Firefox add-on
called Volitrax which will automatically redirect the user to a new location of a music video on
YouTube if the video is removed. The add-on communicates with a web server who stores information
about music videos in Twitter, tumblr, and delicious so the data is likely to survive long
after the application itself has deceased.
JCDL 2013 poster: Semi-Automated Rediscovery of Lost YouTube Music Videos Date: Summer 2012 |
Heather completed work on an iOS version of the Memento Browser.
The web browser uses the Memento protocol to allow users to see
archived versions of web pages in a seemless mannor. The initial browser code was developed by
Dr. Steve Baber and completed by Heather in August 2011. It is available for download
from iTunes and the Google Play Store, and you can download the source code from
here.
JCDL 2013 poster: A Memento Web Browser for iOS Date: Summer 2012 |
Monica continued Richard's work and built a web service called MobileFinder that will return the mobile website URL for
a given desktop URL. Monica also ran an experiment to see how many popular websites follow the
Google's guidelines for mobile-optimized
websites.
Date: Summer 2013 |
Keith worked on archiving mobile websites using
Heritrix. He made some modifications to
Heritrix so the user could indicate if they wanted the desktop or mobile version of the
site. The modified Heritrix crawler uses the MobileFinder web service.
Date: Summer 2013 |
The WAC Summer Workshop was held at Stanford University on June 29-30, 2012. Twenty-one graduate and undergraduate students from a number of universities attended the workshop free of charge (thanks to NSF grant 1009916). The workshop featured speakers from Stanford, Los Alamos National Laboratory, Internet Archive, UC Berkeley School of Law, California Digital Library, Microsoft Research, and other research labs.
You can read Scott Ainsworth's summary of the workshop here and my write-up about the trip here.
Keith Enlow, Monica Yarbrough, and Frank McCown attended WADL 2013 in Indianapolis immediately following JCDL 2013. McCown gave an overview of archiving the mobile web followed by Monica who shared her work developing a MobileFinder web service for web archivists that can automatically locate mobile websites. Keith concluded the talk showing how he had used the web service with Heritrix to archive several mobile websites. Here are the slides from the talk.
This material is based upon work
supported by the National Science Foundation under Grant No. 1008492
(WAC).
Any opinions, findings and conclusions or recomendations expressed in this
material are those of the author(s) and do not necessarily reflect the views
of the National Science Foundation (NSF).