Tech­no­graph. Soft­ware des Teil­pro­jekts B03 „His­to­ri­sche Tech­no­gra­fie des Online-Kommen­tars“

Andere wissenschaftliche Publikation
02.06.24

The Technograph is a tool we are developing in B03 that helps to identify commenting sections on news websites and blogs. Currently, it searches in three samples from the Internet Archive which consist of 500,000 HTML pages of news websites and blogs from 1996 to 2021.

Our core challenge is to identify commenting sections in the HTML code of those websites. We follow two different approaches for this detection of commenting sections:

The first approach searches for unique code snippets such as “fb-comment” for the commenting sections provided by Facebook for news websites and blogs. This approach, in other words, understands HTML as plain, searchable text. For that purpose, we first compiled a list of 16 popular commenting systems and their unique code snippets. These snippets can then be traced in our corpus of 500.000 web pages. It allows us, e.g., to see when news websites stopped or started using the software Facebook Comments. However, this approach is methodologically biased towards a priori gathered data: we only find the (popular) pieces of software we listed before.

Image 1: The example above shows a source code snippet from a webpage (Nordwest-Zeitung) from 2016. “fb-comments” indicates that Facebook Comments has been used by the news websites at this point in time.

The second approach attempts to counter the popularity bias of the first one, insofar as its point of departure is not an a priori built list but HTML tags within the source code. It draws upon the semantics of HTML documents that can be extracted from the source code. During our research, we found out, e.g., that commenting sections are frequently embedded in <div> tags. Within those <div> tags we can search for keywords such as “komment*”. That means, this second approach understands HTML as a language with its own practiced structure that can be used as an indicator for commenting sections, not relying on popularity. It refers to the practices of building websites as a search indicator.

Image 2: The semantic structure of an HTML document with div-tags containing a commenting section.

Both approaches point us to practices of the online comment (such as rejecting the use of Facebook’s software at a certain point in time). The second approach, however, also points us to practices of building websites. Here, parsing and extracting tags functions as a point of departure to research production practices.

The Technograph has three functions within the project’s research. First, it helps understand the data the project has—specifically also its gaps. Second, it visualizes the (de-)popularization or presence of different commenting sections on the (archived) World Wide Web. Third, it functions as a point of departure to look for heterogeneous, mostly non-digital data: The project uses the visualizations to find interview partners—e.g. people responsible for the commenting sections of a website that appeared salient in the visualization. And it uses those visualizations and the webpages this visualizations point to in interviews.

In those interviews the archived artifacts (such as a Facebook Comments interface) of online commenting we identified serve as co-interviewers allowing to raise questions that cannot simply be asked, but rather shown or elicited, e.g. (Paßmann und Gerzen 2024). As these artifacts have necessarily been changed, introduced or discarded, they may sometimes have served as disruptions of practices. As a result, these non-human informants help our informants to reconstruct practices differently—for example, by disrupting the biographical narratives that interviews with web actors often produce (ibid.).

That means, the Technograph helps to reconstruct historical transformations of practices. Practices mostly rely on tacit or latent knowing. However, most practices have their artifacts that they are entangled with. This entanglement helps in the historical reconstruction of practices.

The project thus develops its Digital Methods in a qualitative, praxiographic framework that takes technology as a (non-human) informant and develops its own, adequate technologies to support this research. Naming the software “Technograph” should make this methodological context visible.

Resources

The interface of the Technograph can be found here:
shiny.sfb1472.uni-siegen.de/b03-technograph

The code of the software is available via GitHub:
github.com/SFB1472/tdp-b03-technograph

Weitere Beiträge

Follow the updates? Deve­lo­ping the Tech­no­graph as a metho­do­lo­gi­cal device to work with data from the Inter­net Archive

Vortrag

Presentation at STS Italia Conference 2023, Bologna, by Johannes Paßmann, Lisa Gerzen and Martina Schories

29.06.23
weiter lesen