Situating Cognition

 

Wolff-Michael Roth

University of Victoria
 
Paper presented at the 1999 annual meeting of the American Educational Research Association, Montréal, Québec. This work was made possible in part by Grant 410-96-0681 from the Social Sciences and Humanities Research Council of Canada.
 

All correspondence concerning this paper should be addressed to Wolff-Michael Roth, Lansdowne Professor, Applied Cognitive Science, Faculty of Education, University of Victoria, Victoria, BC, Canada V8W 3N4.
E-mail: mroth@uvic.ca.
Tel: 1-250-721-7885
FAX: 1-250-721-7767


Abstract
 

At least since Descartes' analysis of epistemology which separated body and mind, the source of intelligent action has been attributed to the gray matter of the brain. Despite periods of disinterest in the role of mind in human behavior (e.g., behaviorism), the rise of the computer age also led to a blossoming of the mind-as-computer metaphor (Baumgartner & Payr, 1995). In the traditional mind-as-computer metaphor, intelligence was modeled with systems of discrete physical symbols (tokens of declarative knowledge) that were processed according to fixed production rules (procedural knowledge). The resulting cognitive analyses and artificial intelligence systems exhibited considerable power in modeling intelligent behavior (e.g., Anderson, 1985; Larkin, McDermott, Simon, & Simon, 1980). However, this enormous power was limited to a perspective that physics is one of many so-called well-structured domains where one acquires knowledge made explicit based on algorithms and principles. These systems were virtually helpless when it came to more messy domains, in particular, when it came to modeling everyday competence (Dreyfus, 1992).
Recent research, sometimes referred to as "nouvelle" cognitive science and artificial intelligence, now realizes that cognition as rule-based reasoning is grounded in, and therefore made possible by, an here-to-fore unacknowledged, vast background of embodied and tacit knowledge (Agre, 1997; Brooks, 1995; Varela, Thompson, & Rosch, 1993).[1] This new way of understanding cognition--frequently commensurable with connectionist implementations--takes the human experience of "being-in-the-world" as its starting point and fundamental presupposition. This new wave of cognitive science makes the following central assumptions: representations are relational (e.g., Agre & Horswill, 1997), many activities do not require mental representation but use the world as its own representation (e.g., Kirsh, 1995; Lave, 1988), and embodied (physical) experiences are prerequisite for conceptual understanding (e.g., Hayward & Tarr, 1995; Johnson, 1987). Further, anthropological research showed that a considerable amount of what counts as knowing resides in the everyday practices in which people engage without being aware of their structures (e.g., Lave & Wenger, 1991); these practices frequently resist procedural specification, and only have a tenuous relationship to the plans that supposedly have "caused" them (Suchman, 1987). This flurry of research from many different domains (all concerned with knowledge and intelligent actions) clusters around the banner concepts of "situated" or "distributed cognition." From this perspective, it may not even make sense to speak of situated cognition. Rather, aspects of cognition are simply located differently along the agent-in-setting axis: Some structure in cognitive activity arises more from structure in the setting (cf., Scribner's [1986] analysis how structure of dairy cases and pallets shape calculations) and other structure more from structure in the agent (cf. Lave's [1988] of paper and pencil shopping problems). In this, the unit of agent-in-setting I have used in the past is semantically and syntactically equivalent to the recent conceptualization of a "dynamic unit" (Mandelblit & Zachar, 1998).
Recent work in (science and math) education has attempted to work out whether and how distributed cognition and situated learning are viable theoretical concepts in the context of schooling (e.g., Roth, 1996a, 1996b). Important during the development of new theoretical frameworks were the methodologies for collecting and interpreting data. Both activities presuppose that we understand what the appropriate units of analysis are to capture the different forms of cognition, and the recognition of the often radically different representations various cognitive agents use in their interaction with their (social and material) worlds.
I begin by outlining my epistemological frame which rests on the assumption that the agent-in-setting is the irreducible unit of analysis. To understand and study cognition, I develop a methodology that seeks to construct those structures that we can associate to cognitive activity. As a heuristic, I assume that structure may lie somewhere along the gradient between agent and setting and has different time scales. Learning is therefore understood as the change in this structure. By identifying individual ontologies and by zooming between concurrent levels of analysis, I identify both coarse and fine structure in cognitive activity. After describing the context of one study, I describe the multiple levels of cognition when high school physics students interact with each other and a modeling software tool.

Epistemological Frame

Following recent work in cognitive science and artificial intelligence, I begin with the presupposition that being-in-the-world--as (social and material) bodies among (social and material) bodies--is fundamental condition that underlies all cognition. Thus, whatever we do, we always do it in some setting and context--even cogitating Gödel's theorem in complete darkness and all by oneself presupposes the individual's embodied history in the world of mathematics. This presupposition immediately constrains us to focus on structured activities arising from the agent-in-setting unit.
For each agent, the world has objectively-experienced social and physical structures to which it attempts to adapt. However, because attention and perception are functions of the current state of the agent, they differ from any raw stimuli that hit the perceptual surface (Jarvilehto, 1998a, 1998b; Quine, 1995). That is, because afferent and efferent processes in the organism continuously modulate each other, the individual organism constructs a world in which it also acts. But the world constrains these constructions so that only viable ones survive. Thus, although experienced as objective structures of the world by each individual, these perceived structures cannot be assumed to be the same across individuals. This puts the onus on the researcher to ascertain what and how the world looks from the position of the individual agent, giving rise to different ontologies, or lifeworlds.[2] This also necessitates the cognitive argument to be reversed. The central phenomenon is not how individuals come to act in a stable world, but how different individuals come to a consensus that they live in the same world despite individual differences in perception that only sometimes become obvious.
Most of the world that surrounds agents is essentially unrepresented in the gray matter; that is, agents take their worlds for granted, usually stable, so that we can take it as its own representation (Agre, 1997). For example, one study illustrated that much of the memory of a cockpit can be attributed to the material setting rather than to the gray matter of the pilots (Hutchins, 1995b); other studies show how memory resides in the stories of a community (Orr, 1990) or in institutional arrangements (e.g., Engeström, Brown, Engeström, & Koistinen, 1990). In both instances, memory is located more toward the (social, material) setting pole of the agent-in-setting unit of analysis. That is, the structures of the world as perceived by the agent are the same structures toward which actions are directed. For example, expert (short-order) cooks utilize the physical space of their kitchen and the placement of materials and tools such that these embody both memory and the plans for activities to come. Cooks stack their work environments such that these come to embody memory and plans for actions. This shifts memory allocations onto the setting pole and thereby decreases the load on individual mental (gray matter) processes. Thus, the cooks do not have to keep in mind where things are or what to do next even with multiple customer orders in different states toward completion; furthermore, cooks have certain dispositions to use tools and materials in particular ways that they do not have to represent explicitly (Agre & Horswill, 1997; Kirsh, 1995). Ballard, Hayhoe, Pook, & Rao (1997) showed how short term memory can be modeled as a deictic (pointer) system to keep track of information which, when necessary, is picked up directly from the environment (setting).
To understand expertise, memory, and efficiency, researchers have account for the entire cognitive system (agent-in-setting), because aspects of cognitive structure may be anywhere in the agent-in-setting continuum. The kind of representations used to (computer) model such activities are relational rather than absolute, and arise from the agent's current position and therefore its available horizon. Typically, such relational representations take the form "The cup-I-am-holding," "the place-where-I-am-standing," "the-arrow-I-am-pointing-to" etc. (Agre, 1995). Central to any cognitive analysis, therefore, has to be the identification of those elements in the world that are indeed currently salient to each agent. Neurophysiological studies showed that perception, expectations, and attention are strongly related to an individual's developmental history (Jarvilehto, 1998a, 1998b). Therefore, what is salient to any two agents is likely to differ. The differences are probably large for individuals who do not have had common experiences, whereas they are likely to be very small for individuals who coparticipated in sets of activities under the same conditions over long periods of time (Bourdieu, 1990; Lave & Wenger, 1991; Quine, 1995).
Each agent constantly adapts. These adaptations are continuous such that the next state of the cognitive architecture happens on the surface of the immediately previous state (Churchland & Sejnowski, 1992). That is, the evolving cognitive system integrates over its own history and marginally changes at its own outer surface. Through interactions with their material and social worlds, agents change their relationships with settings, thereby introducing a temporal and developmental component into cognition. That is, learning is grounded in material and social worlds by means of the interactions which change the structured (patterned) relations in the agent-in-setting unit. In this view, learning is the extension of an individual's possibilities for acting in the world, it is a change of the unit "agent-in-setting." It is constituted by changing patterns of interaction, an ever increasing resource of experienced situations, evolving commonsense notions of what the world should be like as one participates to an increasing degree in new communities.

Methodological Frame

Analytic Units

The epistemological framework requires multiple analyses which allow us to understand cognition as a phenomenon that has multiple scales or organization in time, physical and social space, culture, and so forth. As analyst, I therefore seek cognitive structure of the agent-in-setting unit at macro and micro levels across time and (social and physical) space. My analyses therefore begin by careful construction of individuals' lifeworlds, including those elements that are taken for granted and those that are currently salient in the activity. At this stage, the unit of analysis is the individual-in-her-world. However, aspects of this world come only into focus when larger units are considered. For example, a series of actions need to be considered differently if it was assembled for the purpose at hand or a practice common to members in the collectivity (group, community, classroom). In the former case, concatenating the actions into an intelligent whole is likely to be a salient aspect of the current cognitive activity, whereas in the latter case, enacting the practice is part of the (tacit) background.
My analyses begin with investigations of activity, discursive and material actions, and relevant (i.e., salient) artifacts. I attempt to reconstruct the emerging activities with respect to the constraints and affordances provided by the artifacts, classroom norms, and representations (linguistic, graphic, etc.) whether these were established by learners as local norms or within the scientific community (of which teacher and textbook are the representatives). Individual agents bring to each situation ranges of prior experiences organized into domains, mental images, common sense (tacit understandings of how the world works), and understanding of language. Each situation takes place in some setting that has material (e.g., artifacts, constellation of objects, etc.) and social aspects (coparticipants, relationship between them). Both the individual characteristics and the settings shape the agents' actions, that is, frame perceptions and interpretations, drive speech and physical acts, and so forth.
I use two heuristics in order to locate structure in the agent-in-setting system across time and space. First, I identify three dimensions where I might find learning, that is, a change in the agent-in-setting unit. Second, I seek to construct the ontology of the agents, the structure of the world as relevant to their afferent (perception) and efferent (action) cognitive activities.

Three dimensions of learning

To better account for the ways in which knowing and learning are constituted in classroom contexts, I developed a framework (Roth & Duit, 1998) which borrowed from Hutchins' (1995a) work on knowing and learning on navy vessels. In this framework, each moment of practice is embedded in three types of development: ongoing activity, individual (discursive) practices, and community (discursive) practices (Figure 1). That is, each moment (e.g., of videotape) can be analyzed in terms of (a) the unfolding activity, its history, contingencies, constraints, etc.; (b) the individual agent with reference to other moments featuring this agent; and (c) in terms of the practices of the community that envelops the agent (see also Cobb et al., this issue). The developments along the three dimensions occur at different time scales: ongoing activity is always fleeting; changes in individuals' practices may arise from a particular activity, but are usually tied to recurrent activity in the same setting; finally, changes in the practices of the community parallel the slow developments of other cultural practices (Figure 1). Furthermore, the developments along the three dimensions interact with--and therefore constrain (in positive and negative sense)--each other. For example, the development of the classroom discourse arises from the development of individuals, which arises in turn from the development of activities. Then, the developments of activities and individuals are constrained by developments of the more inclusive dimensions (individual and community, community). Finally, the scientific community (represented by the teacher and textbooks) also constrains the forms of discourse that develop at each of the three levels (e.g., the teacher-student interactions described below).

Figure 1. Dimensions of development (bold arrows) and constraints (broken arrows) on the various dimensions.

Ontology

In a recent study, I showed the tremendous variations in responses students provided to structurally identical lever problems when aspects of the setting were changed (Roth, 1998c). Thus, I observed significantly different structure in cognitive activity when (a) the lever beam was marked or unmarked, (b) problems were practical or in the form of word problems, (c) students answered in interview settings or paper-and-pencil format, and (d) students engaged in conversation with an interviewer or with a peer. The study showed that what was salient and therefore became an object of cognitive activity changed across the configuration of the assessment. This and other studies (e.g., Roth, 1996a; Roth, McRobbie, Lucas, & Boutonné, 1997a, 1997b) convinced me that in order to understand unfolding activity, I needed to attend to the setting as it was salient in the agent's perceptions and actions.


Figure 2. Four maps that we might have to describe the trajectory of a fisherman in his boat. Although each frame of reference tells us a part of the story, it is the ontology of the indigenous map representing the world through the fisherman's eyes, that most plausibly explains the trajectory.

Central to the understanding of an unfolding activity, cognition, and actions is the ontology of the world as viewed by each agent. By ontology I mean the ensemble of salient elements each individual perceives, acts towards, and talks about. Most cognitive research presupposes a stable ontology of "problems," setting, utterances, etc. However, an absolute frame of reference, that is, the investigator's domain ontology may not be the most appropriate for understanding the actions of the research participants, nor does it permit to understand the representations that are associated with the ongoing activity, or the rationales for doing one thing over another. Rather, the representations are relational, historically-contingent, interactional and situation-specific and may therefore differ considerably across individuals. To make this further clear, consider the different representation of a fisherman's movements on a river (Figure 2). Whereas all four maps are suitable frames for representing his trajectory, only the fisherman's map related to traditional fishing spots provides an account that explains his activity; yet, when asked, he may not talk about his trajectory in terms of a map. Although the absolute referencing in terms of the Global Positioning System, the channel map, and the geographic map (all stops are at protruding points, but not all points are stops) are suitable external descriptors, they are inappropriate for describing the kinds of representation that the fisherman might have and that actually motivated his actions. (For an extensive discussion of representations in everyday activity see Agre [1997] and Chapman [1991].)
The importance of getting an individual's own ontology of the situation right was recently pointed out in two independent studies of children's reasoning on the balance beam (Metz, 1993; Roth, 1998b).[3] For many years, researchers assumed that children were acting on and towards "weight" and "distance-to-fulcrum." However, Metz and Roth pointed out that both "weight" and "distance" are emerging concepts in the sense that children developed them through their interaction with the materials and in the particular settings. That is, whereas researchers traditionally had assumed children to give incorrect responses, this research shows that children did not respond to weight and distance at all. The very structure of the focal phenomena and therefore children's ontologies were different from what had been assumed leading to a reinterpretation of what had been considered cognitive deficiencies in the past. Getting a sense for the ontology of others takes considerable familiarity with the people and the places they inhabit paired with a radical disbelief (questioning) of any ontology, however plausible it may seem (e.g., Bourdieu & Wacquant, 1992).

Analytical Processes and Presuppositions

Understanding the three dimensions of learning and individual ontologies requires (a) deep familiarity with the setting and (b) radical disbelief in presumed ontologies of individual agents. I therefore enact (a) ethnography involving long periods of stay in the worlds of interest and many interactions with the people who inhabit these worlds and (b) critical hermeneutic analyses involving long and intensive periods of watching video and reading through texts. First, I may stay 3 or 4 months at 4 1-hour lessons a week in the same classroom, plus spend additional time interviewing and planning with teachers, interacting with parents, and participating in staff meetings. This intensive interaction at various levels of school life allows me to experience the lifeworld of my research participants, of the school and classroom as a culture, local practices, ways of interacting, etc. Several of my research programs arose from long-term commitment to the same site so that the understanding undergirding each report arises from 3-year involvements (e.g., Roth, 1995b, 1998a). Second, I spend extended periods of time with the videotapes (often with colleagues who bring different perspectives) radically questioning my own ways of viewing the events in the attempt to reconstruct the ontology salient to each agent in the setting.
As I participate in the situation, all videotapes are transcribed in an ongoing manner--often by myself--so that the text is available in written form during my ongoing analysis. Texts, photographs, and copies of written artifacts are inventoried and scanned to be quickly available through one and the same computer interface. I also play the videotapes through the computer interface and use a stereo system to achieve maximum resolution of the audio channels. Before writing up a study, I spend between weeks to months watching video tapes and reading texts to the point that the entire database becomes a familiar (multi-dimensional) environment with multiple sense-making resources (cf., Greeno, 1991) that allow me to situate cognition. During this phase, I write notes, again using the computer so that the notes themselves become part of the data set.
My analyses, grounded in semiotics and hermeneutic phenomenology, are based on the assumption that reasoning is observable in the form of socially-structured and embodied activity (Garfinkel, 1991; Suchman & Trigg, 1993). In my analyses, videotapes, transcripts, and artifacts produced by the participants are natural protocols of their efforts in making sense of, and imposing structure on, their activities. These protocols constitute the texts that I structured and elaborated in the analyses. When I work with colleagues, we organize our analytic work around the precepts of interaction analysis (Jordan & Henderson, 1995).

Context and Data

To situate the subsequent analysis that exemplify my data analyses, this section provides descriptions of the participants, setting, data collection, and specific frame for data analysis. The examples for this article derive from a research project which was conducted during an eleven-week unit on mechanics and kinematics topics. The course was premised on the assumption that learning means to achieve a certain level of competence in talking physics (Lemke, 1990; Roschelle, 1992; Roth, 1996c). Thus, I had planned many activities that engaged students in physics conversations. These activities included, (a) open investigations of motion phenomena chosen by students according to their own interests, (b) explorations of phenomena in a computer-based microworld (Interactive Physics(TM)), and (c) collaborative concept mapping with the main concept labels of a unit. Students were asked to read relevant chapters in one of the available textbooks (e.g., Hewitt, 1989) on their own, and to complete 6 problems per week. The open investigations of natural phenomena constituted the core of the curriculum, the microworld activities occured once every other week interspersed, and the collaborative concept mapping took place once a month. Microworld activities and concept mapping were thought as context in which students focus more on the conceptual aspects of the physics of motion than on the mechanical aspects of implementing their practical research.

Computer-Based Microworld Activities

Interactive Physics(TM) is a computer-based Newtonian microworld in which users conduct experiments related to motion (with or without friction, pendulum, spring oscillators or collisions). The microworld allows users to represent observables (measurable quantities) in different ways. For example, force, velocity, or acceleration can be represented means of instruments such as strip chart recorders and digital and analog meters. More importantly, as in Roschelle's (1992) Envisioning Machine, Interactive Physics(TM) allows a superposition of the conceptual representations of these quantities, vectors, and the objects creating hybrid objects bridging phenomenal and conceptual worlds (Roth, Woszczyna, & Smith, 1996). All student activities in the present study included, at a minimum, one circular object (Figure 3). A force (full arrow) could be attached to this object by highlighting and moving it with the mouse. The object's velocity was always displayed as a vector and students could modify its initial value by highlighting the object, "grabbing" the tip of the vector, and manipulating its magnitude and direction. Students were instructed to find out more about the microworld, especially the meaning of the "arrows," that is, the vectors representing force and velocity. Although students concurrently conducted real life experiments on motion in which they analyzed distance-time, velocity-time, and acceleration-time graphs, they were not told the scientific names of the "arrows." Some of the prepared activities displayed nothing more than the circular object (including its velocity) and a force. Others required students to manipulate the "arrows" (force and velocity) to hit a small rectangle and throw it off its pedestal. After setting force and initial velocity, students could "run" the experiment. A tracking feature "froze" the motion as if recorded with flash photography. During the microworld experiment, the cursor took the form of a stop sign, and a simple mouse click stopped the motion. The replay feature allowed the inspection of individual states in the motion of the sphere (on the bottom left of the screen in Figure 3, we can see that the current simulation contained 51 frames).




Figure 3. Interface of Interactive Physics(TM), a Newtonian microworld superposing phenomenal objects (ball, and conceptual framework (i.e., velocity and force vectors).


 

Participants

Forty-six Grade 11 students (41 males, 5 females) from 3 sections of a qualitative Grade 12 physics course participated in this study (20, 15, 11 students, respectively). The students attended a private school in Canada (grades 4-13), which was in its first year of transition from an all-boy to a coeducational institution. For about half of the students, this course was a precursor to the Grade 13 advanced physics course. Most students were not science majors and later pursued careers in business, medicine, law, and politics. I taught all three sections of this physics course. At the time of data collection, I had eight years of teaching experience at the junior high and high school levels (physics, physical science, computer science, and mathematics). My training and experiences include an M.Sc. in physics, laboratory research, and high school teaching certificates for physics, chemistry, and computer science.

Data Collection

On the computer, four groups of students--representative of the entire physics course in terms of achievement and gender--were each videotaped during three 60-minute classroom periods separated by 2-week intervals (physics was allotted 180 minutes/week). The physical configuration of students and recording devices are represented in Figure 4. The descriptions of learning developed in this study are based on the entire data corpus constituted by the tapes and transcripts. For the purpose of illustrating my claims, I selected episodes from one of these groups, Glen, Elizabeth, and Ryan. The three students were in many ways representative of the students I had taught in various public and private schools throughout Canada. They were not "typical science students," did not achieve in the top quartile, and did not enroll in science or a science-related field at the university level. As a group, the three had a preference for agreement and conflict was not part of their interactions. The three worked together rather well and although they did not know each other initially, they stayed together as a group for the whole school year.




Figure 4. Physical arrangement and recording set up for the Interactive Physics(TM) activities as they would have appeared from above.

The data for the computer activities exist in a large context of other data collected during the same school year with the same three classes. These data include video records during students' experimental work, semantic networking activities, and during individual interviews about knowing and learning physics. Furthermore, hard copies of the results of laboratory work and student reflections on knowing and learning in diverse physics activities also entered the data base. For the group of three students presented here, the additional data base contextualizing the Interactive Physics(TM) study includes 15 reports of independently-conducted laboratory investigations, 10 1-hour sessions of semantic networking, one exam and 3 tests per trimester, 13 essays on knowing and learning, and a series of interviews focusing on physics knowledge and epistemology.

Analysis of Discourse over and about Inscriptions

 
In the course of some conversation and by using words and gestures, speakers make salient certain objects and events within a more complex context. In the process, these objects and events are "foregrounded" whereas the remainder of the inscription recedes into the background. An important component in the analysis of discourse situations is the relationship between talk, inscription (external representation), background, and gesture. In the present case, to analyze what was happening as students interacted with each other and Interactive Physics(TM), I needed a framework to conceptualize where I might find structure in student activities (Figure 5).


Figure 5. Analytical framework for conversation in the front of a representational medium (e.g., chalk board, computer).
 
 


Using an excerpt from the study explained in more detail below, the data turn into displays such as that featured in Figure 6.
 

A   video
B So like this arrow forces it (.) to a certain extent audio
C | [arrowup] | | [arrowup] marker
D 1 2 3 4 5 reference
E (0.80) (1.03) (0.10) (0.47) [Delta]t (time)

Figure 6. Analysis of gesture-discourse relationship during conversations over and about objects and events in a computer-based Newtonian microworld. Timing points are marked relative to utterances (vertical lines) and video frames (vertical arrows). For example, there is a 0.80 second delay between the onset of the utterance "like" [1] and the first of the two video frames [2].


This particular display is subsequently used to construct a relationship between gesture, talk, and their temporal development over shorter (minutes to one hour) and longer terms (4-6 weeks). The analysis proceeds as follows: The present excerpt shows that there is a 1.03 second delay between the deictic gesture (pointing) [2] and the utterance "this arrow" (further made salient by a little 0.10-second jerky movement of the pencil [3-4]); furthermore, the iconic gesture that simulates the force arrow's movement across the screen begins 0.47 seconds prior to the verbal description of the arrow's action on the object designated by the indexical utterance "it." From prior research on gestures and the relationship between gestures and utterances (e.g., McNeill, 1992) I know that these delays are significant; such delays between gestures and discourse are part of a developmental trajectory which ends in the eventual overlap between the two distinct of forms of representations embodied in kinesthetic and verbal coding (Roth, 1999). The context of the unfolding activity makes the interpretation of "it" as indexing the circular object very likely. Here, the student appropriately described the action of the arrow as "forcing" the object. Yet he came to consistently use the arrow as "force" only during the subsequent lesson two weeks later (and about 1 hour of further activity). These events are further notable, for they illustrate that even if students make utterances and gestures apparently consistent with the relevant science, there are no mechanisms inherent in the physical world that select these actions over others that may be scientifically incorrect.
 

Knowing and Learning in a Physics Classroom

Reconstructing multiple dimensions of learning as outlined in this paper is a complex process that produces analyses exceeding normal journal space allocations. The following sections are therefore intended to exemplify multilevel analyses rather than as a complete and coherent argument for all aspects of knowing and learning that the data permit me to make salient. In the following four sections, I present the different structures of cognitive activity made visible by the particular frame chosen.
1. By focusing on the unfolding activity (horizontal axis in Figure 1), I show students co-construct a description in real time and subject to the history and contingencies of the activity.
2. By focusing on the ontologies of students and teacher, I show (a) how the "same" screen events are perceived differently by students and teacher and (b) how the teacher's (my) interactions with students constrained their perceptions of the on-screen events. (See Figure 1 and the constraints on the development of practices in the three dimensions.)
3. By focusing on different parts of the physical setting (different layers in Figure 5), I show gestures interact with the visual display, and how they may forebode understandings that verbal discourse reveal only much later. (Here, knowing is understood as distributed across body and setting.)
4. By focusing on physical arrangement, social configurations, and the nature of focal artifacts I show how these interact to give rise to different participation and discourse patterns, and therefore to what we understand as macrostructures in cognitive activity.
Each of these analyses shows a different aspect of situatedness, none providing a picture of cognition that is complete in itself. Any selection of one of these aspects, however principled, may automatically exclude other, equally principled selections. In my understanding of situativity, we need to account for all of these aspects (and more are possible) to get a sense of what cognition involves and what makes it possible. For example, the social construction described in [1] comes about because of the type of constraints described in [4], and presuppose a convergence in the participants' ontologies (which, by default, they take as shared) [2]. In the unfolding events that lead to students' sense that their understandings are shared (i.e., socially constructed [1]), the gestures which make salient particular aspects when read against the background [3] thereby allowing to understand the ontology underlying the students' actions [1]. For a complete analysis, I zoom through an entire spectrum of (temporal and spatial) frames, though space limitations in research journals usually require a separate presentation of each analysis. I consider any one data selection and reduction as limiting our understanding of cognitive processes.
All episodes selected are representative of other video segments collected around the same moment in time in terms of: (a) the nature of students' discourse, (b) the integration of gestures and talk, (c) the manipulation of objects on the interface, (d) the nature of students' ontologies, and (e) the nature of student-student and teacher-student interactions. Thus, for example, the episode in Figure 6 could be exchanged with that in Figure 7 without a change of the argument. Episodes without video could have easily been enhanced by video off-prints to make claims about the interaction of gesture and scientific talk. The particular episodes featured are therefore a matter of pragmatic choice among many possible alternative episodes.

Social Construction

The students' task was to find out about the relation between the motion of a circular object and the two arrows, and to construct an explanation of how the microworld works. Prior to this episode, the students had already conducted several experiments with different configurations of [velocity] and [force], leading to different curvi-linear trajectories.[4]

velocity() force( )

At one point, Ryan accidentally detached the force arrow from the object; but the three decide to run an experiment in this new configuration. They discuss the resulting screen display in the following excerpt.
 
G: So when you don't run it with this arrow 
POINTS[force] 
it goes in the same velocity 
TRACES[trajectory] 
R: It just goes in the same direction 
GESTURES[trajectory] 
this arrow, like is initial (2.3)
POINTS[velocity]
the later direction
E: That means it's a constant
G: So like (2.8) this arrow forces it to a certain
POINTS[force]
extent
R: It changes direction after the start

Glen provided a first description in terms of [velocity] as moving "in the same velocity" while his deictic gesture first pick out the arrow, followed by an iconic gesture that traced and therefore made salient the trajectory. Ryan first followed up by describing the trajectory as being "in the same direction" and traced a straight line in the air, and then links [velocity] to a feature of the initial state in the experiment, but, overlapped by Elizabeth did not complete his statement about the final direction. Elizabeth's statement about something being constant can be read as confirming both Ryan and Glen's earlier utterances "same direction" and "same "velocity." Glen, followed by Ryan, describe the action of [force] as "forcing" and "changing direction after the start."
In this episode, the three students produce descriptions commensurable with Newtonian physics. They use gestures and utterances to pick out, and make salient, a limited number of objects ([force], [velocity]) and events (trajectory). These observation descriptions are assembled in a public space, and require both the inscription and the gesture. The gestures allow students to fix the referents of some words, though in this episode, the referent for the deictic term "it" fluctuates and its referents are never clarified. For example, in Glen's description, the [force] acts on "it," presumably the object. However, Ryan's "it changes direction" does not unambiguously pick out whether "it" is [force] that causes some change, [velocity] which changes, or the object which moves on a curvi-linear trajectory. It therefore needs to remain open whether students talked about the arrows or the objects. Furthermore, their observation sentences do not require internal representations (Quine, 1995). We can therefore take the unfolding conversation as something that exists in public space (thus somewhere other than the agent pole) leaving open what memory traces they leave (what students learned), or how this aspect constrains later developments of the conversation.
One might be tempted to infer from this transcript that the three have evolved mental representations consistent Newton physics. For example, Glen might be interpreted as having a representation of [velocity] ("this arrow") as indicating a constant velocity of "it," the circular object. As indicated, we need to radically question our own ontologies and how we attribute them to the agent. Later parts of the unfolding interaction shows that the discourse was not stable. In fact, as the conversation unfolds, there is a considerable variation in the designations used for [velocity] (little arrow, big arrow, initial speed, velocity, initial speed, velocity, force, effort, strength, speed, strength, speed, direction, speed & direction, velocity) and [force] (little arrow, big arrow, time set, time, direction, time & direction, velocity, redirection, gravity, force, gravity, gravity). The two lists show that the same labels were used to denote different arrows. In this sense, the above observation sentences constructed by the students were constructed in the context, contingent on the computer configuration, history of the emergent conversation, and students' perceptions. Existing family resemblances between scientific discourse and vernacular all too easily lead researchers to make assumptions about representations and conceptions are not viable representation of students' knowing.
The episode shows us how students coproduced a description of an event in the sense that all observation sentences highlighted something as being constant when one arrow ([force]) was disconnected from the object, and that there were changes in the direction when the same arrow was attached to the object. I know that students, out of these uncertain beginnings, developed a consistent way of describing and explaining the phenomena at hand (Roth, 1996c). However, this development did not occur independently of other events in the classroom. Rather, the interactions between myself (teacher) and students brought about changes in the way students perceived, and talked about the events.[5]
We might assume that conceptions drive what students say. Their talk is then considered as a medium of externalizing thoughts and conceptions from the computational hardware to the public forum. Such a view is inconsistent with the data presented here because of the considerable variations in the discourse which would have required to make the assumption that their "conceptions" constantly changed. Based on my epistemological frame, I make the less stringent assumption that students produce situated observation sentences out of their interactions in the setting. This does not necessitate representations, for the relevant elements (image to be described, language, gestures, etc.) can be picked from the setting. These descriptions are ephemeral and may be forgotten in the next instance so that subsequent sentences may in fact be incompatible when studied by the researcher. On the other hand, observation sentences can also be stabilize within the group and then become conversational results that students remember, and which therefore last beyond the immediate activity.

Perceiving Forces

Science educators and researchers of cognition often assume that interacting with materials (diagrams, texts, graphical models, tools, instruments, physical phenomena) provides students with relatively unambiguous visual experiences. All students really have to do is look and see, or infer, the same patterns available to those with a scientific background. Our own studies in a variety of settings and cultures show that this is not the case (Roth & Duit, 1999; Roth, McRobbie, Lucas, & Boutonné, 1997a). Even the perceptions of carefully staged teacher demonstrations were radically different and a function of students' expectations (Roth, McRobbie, Lucas, & Boutonné, 1997b). Elicitation of these differences and provisions of constraints that afford students to make observations relevant to understanding the scientifically correct framework are therefore crucial elements of teacher-student interaction. (These constraints operate at all three levels of learning as illustrated in Figure 1.) The particular form of teacher-student interaction, and the affordances of the (computer-animated) inscription provided constraints that allowed students to modify their observation sentences, and therefore to reconstruct the ontology of the focal events.
Because the three students had not come to a consistent description after a considerable exploration time, I decided to set up an experiment for the three students so that they could construct an analogy between their lived world and the microworld. I oriented the force so that it would push the object downward, but oriented the initial velocity upward.
 
I: What if you had that point up (3.1) 
GRABS[velocity] POINTS[up] 
and this one would be pointed like this? 
GRABS[force] POINTS[down] 
G: It would go straight down 
R: Yeah, it would go downward 

Here, Ryan and Glen responded to my "What if. . .?" question by stating the hypothesis that the object would immediately descend ("straight"). They then stated that they saw the object going down before Elizabeth contradicted them:
 
 
I: But first?  
E: I think it went backwards first though. 
R: The initial velocity went the way the little arrow goes. 
E: Didn't it go backwards first and then go forwards? 
R: I think so. 

Coincident with my questioning "But first?," Elizabeth contrastingly ("though") described the object as going backward first. This contrast, together with "first" can be read as an oppositive description to that provided by Glen and Ryan. Ryan responded by describing the initial movement in the direction of the "little arrow," to which Elizabeth reiterated her perception as a contrast ("didn't it...") to what she understood Ryan as saying. Subsequent to this interaction, I asked students to relate this phenomenon to something in their everyday life to which the three responded with descriptions of a returning hula hoop, yo-yo, and object thrown in the air. Out of this, Elizabeth suggested that [force] represented something like gravity.
Here then, Glen' and Ryan's ontologies were different from mine and that valued in the scientific community. That is, they perceived the object as moving straight down contrary to my own expectation and observation that the object should move upward before it descends. Not perceiving the initial upward motion is significant, for it does not allow an understanding of the relationship between velocity and forces in the early part of the trajectory, and therefore a more general theory of forces and the motion of objects. Therefore, despite setting up what I considered a crucial experiment, the move was unsuccessful in the very first instance. This changed with my question and Elizabeth's different observation description.
Whereas the students appear to have come to an agreement that the object moved up before it descended, the episode does not make clear whether they actually observed the [velocity] change: from being pointed upward it decreased to zero, and then increased again pointing downward. In fact, the subsequent episode shows that during the moment of teaching, I interpreted students' talk as not having made this observation and therefore ran the same experiment repeatedly in slow motion until the students' observation descriptions included not only the object but also the two vectors. We therefore note that despite the very small number of elements constituting the microworld--a circular object and two arrows at the focal point, and the remainder of the interface in the immediate background--students did not perceive it in the same way I, a physicist and physics teacher did perceive it. Students had a different ontology of the microworld: although it was experienced as real by the students, it was an ontology that was inconsistent with a Newtonian explanation. However, the constraints provided by different observations within a student group, and those provided in interactions with the teacher, are crucial for establishing the phenomenal backdrop to any correct understanding of the theory which students are to learn according to the curriculum.
Because of my familiarity with Interactive Physics(TM), the microworld was a tool that permitted me to engage students in a situated inquiry. Particularly the slow-motion option afforded students to perceive the upward motion (and upwardly pointing [velocity]) during the initial phase of the trajectory.

Gesture and Scientific Talk

Interesting aspects of cognition can also be found in the relationship between gestures, talk, salient objects (figure), and the ground. Thus, focusing (zooming in) on this relationship allows us to understand how cognition is distributed across the agent-in-setting unit. When students engage in practical science activities, their gestures often arise from, and abstract, earlier manipulations of objects (Roth, in press). Furthermore, manipulations and gestures preceded and were integral part of the construction of conceptual categories related to simple machines. In this section, I provide one analysis of the relationship between talk, gesture, and setting. Here, gestures obtain significance in two considerably different respects. First, they assist in grounding talk to specific objects present in the conversational situation and thereby constrain the ways utterances can be used. As I pointed out earlier, there was considerable variation in the words students used to name or categorize the different elements in the microworld, and in understanding how these elements (object, arrows) interact. In this seeming chaos of the same words being used to denote different objects, deictic and iconic gestures were crucial to establishing common ground, finding appropriate observation sentences for the situation at hand, and ultimately, arriving at a theoretical discourse that was consistent with Newtonian physics. Second, this episode is but one instance from the developmental trajectory of a student where the gestures provide iconic descriptions of the events long before the student actually masters the appropriate scientific discourse. In this situation, gestural (and verbal) deixis was crucial in coordinating utterances, gestures, and the phenomena in the microworld.


 
Wouldn't the length of the 
|(1.47)[arrowup] (2.00) 
 
arrows (1.60) Since that arrow 
[arrowup] 
 
`s longer the velocity is higher 
(1.47) [arrowup] (0.33) |(0.10) 
 
that's 
[arrowup] (0.20) 
 
why:: it's 
[arrowup] (0.53) 
 
pushing it that'a way. 
| (0.83) [arrowup] 

Figure 7. Exerpt from the conversation in which Glen gestures pushing, parabolic trajectory, and changing velocity two weeks (lessons) prior to verbally expressing the same relationships.



 

At the moment of the episode, the three students still had no grasp of what the arrows stand for and how they related to the moving object. The students had previously affiliated them with time, energy, time step, and many other lexical items. Here, then, Glen attempted another description and explanation of what they just observed (still visible in the top left slide). His utterances (Figure 7) were accompanied by the gestures of both hands which enacted the arrows and their behavior as he had seen it previously. Glen had held his right hand with fingers parallel to the outline [force] arrow--in the form more clearly seen in the second frame--for 3.47 seconds prior to specifying its referent in Frame 2 ([1]-[3] in Figure 7). He then made another, 0.10-second circular gesture which marked the right hand while uttering "that arrow" that immediately preceded the causal meaning unit "that's why it is pushing it. . ." Before he voiced "the velocity" ([4]), his left hand can already be seen at the top left of the image, held parallel to [velocity]. In Frame 3, both hands are visible: the right parallel to [force], the left parallel to [velocity]. In the next frame, the right hand was already "pushing" against the left hand which is moving off the frame to the left. This movement continues to the end of the sentence and out of the video frame. This pushing motion of the right hand began 0.83 seconds before the associated lexical affiliate "pushing." Here, the shape of the object's trajectory (visible in Frame 1) which he attempted to explain was already completely described by the iconic gesture (and trajectory) of his left hand. The episode is complex because there is one word "arrow" but two arrows on the monitor, and the same indexicals "that" and "it" occur repeatedly but refer to different objects and have different functions.
"That" appears three times, and each time not only the referent but also the function is different. In the first instance, "that" ([3]) has a deictic function designating a particular arrow standing in opposition to the speaker (distal use). Coinciding with the utterance, the right hand which had moved to the right, came to a sudden stop. As can be seen from Frame 2, the fingers of the right hand stand parallel to [force]. This finger position, the noticeable (abrupt) stop of motion, and the coincident utterance "that arrow" makes it reasonable to assume that the right hand models [force]. The listener can draw further confirmation for this interpretation from the causal connection between "that arrow" and [force] because it is the one that the three students had previously manipulated, whereas [velocity] only changed as a function of their action.
In the second instance, "that" ([6]) introduces the causal consequence ("that's why") of the hand arrangement he had set up and described in the previous part of the utterance; "that" falls at the beginning of gestural trajectory which iconically re-represents the earlier visible trajectory (Frame 1). Finally, in the third instance, "that" is linked to "way," the immediately preceding trajectory ("way") enacted by the gesture. In vernacular, "that way" most frequently expresses a specific direction. Here, however, "that'a way" together with the curved motion of the hand, when read against the ground of the earlier curvi-linear motion of the object and the corresponding positioning of the arrows, highlights not only the existence of the trajectory but in particular its curvi-linear shape.
The indexical term "it" was used twice, but when the gesture is viewed against the microworld in the background, the two referents in "It's pushing it. . ." can be disambiguated. The utterance occurred while the right hand followed, fingers pointing to, the left; heard together with "It's pushing it," the right can be understood as literally pushing the left hand (Frames 3-6). Here, the first use has as referent the hand/ arrow which is pushing (enacted by the hand) and the second occurrence has as referent something that is being pushed which, in this case, could be the second arrow/left hand, or the object.
At the time of this episode, Glen (as his two peers) did not yet describe the arrows in scientific terms, that is, as force and velocity. He used the appropriate scientific (verbal) discourse only two weeks later during the subsequent lesson with the microworld. However, in the present episode, his gesture--when understood as a description of the relationship between the concepts of velocity and force--was consistent with scientific practice. He characterized the action of the outline arrow as "pushing," which is a vernacular form of describing forces. Finally, he associated the longer pushing arrow with a resulting higher velocity. Here, the referent of "velocity" is not completely clear and two readings are possible. Because the utterance coincided with the positioning of the left hand, "velocity" can be heard as the referent to the left hand: therefore, the longer force (right) arrow pushes more and therefore leads to a longer velocity (left) arrow. But the fragment "Since that arrow `s longer the velocity is higher" could also be interpreted such that the longer right arrow is equivalent to a higher velocity in which case "velocity" would have been anchored in the right arrow (incorrectly so from a scientific perspective). However, the nature of the referent for each of the two hands was disambiguated by their position in space in the course of the motion. The fingers of the right hand kept a constant direction just as the outline (force) arrow, whereas the left hand changed direction, though less rapid so, in the way the single-line arrow did previously. Thus, Glen's gestural description and explanation of the events was consistent with scientific practice long before his verbal explanations.
In this episode, gestures, animated diagrams, and words were deeply integrated. (Though there are some studies related to interaction of gesture and speech in a variety of non-motion domains, the role of gestures in scientific and mathematical discourse largely remains unexplored.) That is, the structure in the activities arise from structure in each of the levels so that we can view cognition as distributed across the agent-in-setting unit. Any one considered by itself does not help us understand or infer the moment of practice. Taken as a whole, gestures, words, and diagrams (both topic talk and background to gesture) make a lot of sense. Because we have to consider these elements together, it makes sense to speak of cognition as being situated. The structure and coordination of the actions make sense if considered in this particular setting.

Setting Effects

Additional cognitive structure in the agent-in-setting unit may be identified if one zooms out and considers the macro aspects of the setting. That is, patterns that we associate with cognition can be identified when we investigate representational artifacts, social configuration, physical arrangements and the interaction between these elements (Roth, Woszczyna, & Smith, 1996; Roth, McGinn, Woszczyna, & Boutonné, in press). Such elements are the concerns of recent research in work place studies, but are seldom addressed in educational research. However, my studies revealed some of the mediating effects and interactions of representational artifacts, social configuration, and physical arrangements on student participation during science conversations and on the form and content of these conversations. That is, structure in the activities (and therefore cognition) arises from structures at a more global consideration of "setting." In the present situation, we might ask what the role was of computers in the coordination of the groups, how group size constrained[6] the development of the ongoing activity, or how the physical arrangement mediated the participation.
We have already seen how the interface provided students with a context that facilitated their mutual orientation to each other and the joint problem. Through such mutual orientation to objects and talk, students coordinated their utterances and gestures with the microworld objects and events, allowing them to make sense and evolve common observation sentences of, and explanations for, the microworld phenomena. But the physical and conceptual nature of the interface also interfered with student interactions in two important ways. First, Interactive Physics(TM) frequently constituted a tool that is "unready-to-hand" (Brown & Duguid, 1992; Winograd & Flores, 1987). In contrast to a transparent tool which can be used without cognitive effort, a tool that is unready-to-hand draws the user's attention to itself, and thus away from the real problem to be solved. Although considered "user friendly," the interface proved to be complex and required more time to learn its operation than could feasibly be made available in the context of the present physics course. Second, when there were more than 2 students, the physical arrangement of people and computer organized interactions in such a way that it curtailed the mutual orientation of the students.
The interface can be considered a tool for exploring the microworld, and by means of this activity, to learn physics. However, confirming similar studies of human computer interaction (e.g., Suchman, 1987; Winograd & Flores, 1987), this study shows that tools constrain actions in some ways, but can be interpreted in multiple ways and therefore do not embed unambiguous meanings. For example, students in this classroom often interpreted software feedback in ways unintended by the designers. In one situation, Ryan tested a configuration of object and arrows, and, after the object had raced off the screen, received as feedback the message, "Object velocities are high for this simulation; reduce time step for greater accuracy." The three students subsequently denoted [force] with "time step." This was a surprising interpretation which appeared to be completely off the wall. The conversation becomes more understandable when we consider a larger time frame and both humans and the machine. That is, Table 1 allows us to attribute particular aspects of the structure of interaction (and therefore of cognition) to users and software; that is, to different locations across the agent-in-setting unit.

Table 1. Machine and human perspectives on the unfolding activity
 
 
THE USERS
INTERACTIVE PHYSICS(TM)
 
Not available to Interactive Physics(TM)  Available to Interactive Physics(TM)  Available to the users  Design rationale 
E: Pull it out 
G: So pull it now go that way  R INCREASES [force] TURNS[force]  Object-oriented manipulation of physical variables. 
R: Oh, what did I do? 
G: Cancel, Oh there we go, leave it, yeah. Alright, now push it back, keep connected to the back, now run it.  R INCREASES [force] TURNS[force]  Object-oriented manipulation of physical variables. 
G: OK run it, oh baby! yes  R STARTS[experiment] 
Given position, initial velocity, force, mass, calculate and display trajectory. 
DISPLAY PANEL: Object velocities are high for this simulation. Reduce time step for greater accuracy.  High object velocities along trajectory cause large position changes, cause inaccuracies in trajectory and recalculation of velocity, acceleration 
E: Which one's the time step? 
R: It's that big arrow 
G: Oh yeah the big arrows time, OK. I comprehend. 
 

Within the students' horizon, the display panel message followed their immediately preceding lengthening of one arrow (force). As a result of these actions, the software makes available a trajectory, followed by the panel message. Thus, within the students' interpretive horizon, the message was an immediate consequence of their previous action of lengthening the arrow. The word "reduce," when viewed in the context of the previous lengthening was used as a resource to relate the subsequent "time step" to the manipulated arrow. However, the interpretive frame of Interactive Physics(TM) is different. It was designed to run and display the experiment given a particular specification of the relevant variables (mass, velocity, position, force). The message, based on the size of velocity somewhere along the trajectory, was designed to indicate an action that allowed greater accuracy given the user specifications. Thus, rather than basing its feedback on the history of the interaction or on the specified size of the variables, the system starts with an aspect of the simulation not directly available on the interface and often used in default mode and checks whether the simulation is possible with a modified time step. In the hand of a competent user who is familiar with the design rationale and simulation practices, however, the message is likely to be interpreted differently (e.g., in my own case).
Earlier analyses showed how students were enabled to use deictic and iconic gestures that grounded their utterances, and, when viewed against the interface as background, helped the speaker to make salient those aspects relevant to his explanation. However, as can be seen from Figure 4, when there are 3 or more individuals oriented toward the interface, there are space constraints on possible physical configurations. Whereas all members could see the gestures against background by those sitting close, the same affordance did not exist for other participants. Thus, while the physical setting did not preclude participation in the conversation, it did preclude anchoring functions of gestures. However, because gestures are central to scientific laboratory talk (Lemke, 1998; Suchman & Trigg, 1993), not having equal access to the representational medium actively interferes with learning (Roth, 1996d). The point here is that not being able to handle the computer input is far less important than the exclusion from the on-going conversation because of limited access to a different mode of communication. Those students who are excluded are likely to engage in activities unrelated to the task or subject, thus to engage in "off-task" activities.
Compared to the previous sections, the present analyses focused on learning in a broader frame considering how physical arrangements, (size of) social configurations, and nature of focal artifacts interact and affect conversational and participatory patterns. This broader focus then leads us to construct different aspects of cognition. Rather than focusing on mind alone, I find it useful to look at and describe events as they emerge from agent-setting ensembles. That is, I conduct my analyses from the perspective of an irreducible unit of analysis constituted by being-in-the-world which forces us to consider all events as acting-in-settings.

Discussion

The different analyses of knowing and learning in the physics classrooms provide different takes on the structure of activity, and therefore of intelligent action of agents-in-setting and being-in-the-world. The analysis of an individual's gesture and talk over and about inscriptions shows how deeply integrated these are. Gesture, talk, or inscription taken by itself or even in pairs provides sufficient grounds to predict the third. Furthermore, the changing relationship of gesture and talk over time also suggests that, for the individual, the nature of the display changes. At one end, there were arrows and a circular object. At the other end, there were "velocity" and "force" as vectors that had different relations to the object. When we consider the phenomenological being-in-the-world which is continuously transformed through the experience, we can always break out a part, the agent, setting, or relation between the two and see that they have changed. However, I suggest that we maintain the cognitive unit of analysis and always consider the agent-in-setting in its entirety with the possibility to locate structures in activity anywhere between the two poles. By changing focus and by zooming between levels, different structural grain becomes visible; but it is always part of the overall picture. What the relevant setting is cannot be answered a priori but is, because of the contingencies of perception and attention, an empirical question. By adopting such a unit of analysis, researchers therefore actively situate cognition.

Layered Analysis and Zooming

Central to my approach is the use of multiple levels of analysis which reveal different aspects of a more general phenomenon which I call cognition. To locate the structure of cognition, we have to do analyses at multiple levels which requires zooming. The question then is whether the phenomena at one level explicate phenomena at the next level. This does not have to be. To explain, let me draw on natural phenomena as an analogy. From recent work in non-equilibrium thermodynamics we know that self-organizational phenomena observable at one level and under certain conditions cannot be explained by the behavior of the system under different conditions (e.g., Prigogine, 1980). Thus, when individuals come together to work on collective tasks, each unit of agent-in-setting is different, for the setting is different. In the past, I have used the analogy of a network in which various actors, human and non-human, individual and social, are connected to give rise to a cognitive system (Roth, 1998a). At some chosen level, actors are taken as black boxes. However, each actor can in turn be regarded and analyzed as a network, constituted of actors taken as black boxes. The network analogy therefore is self-same in the way we have been accustomed to fractal phenomena. Depending on our current level of analysis, we observe patterns with their own colorations and structures that will change with a change as we zoom in and out.
Different foci of analysis also require what are considered different methodologies. To study of gesture-talk-ground coordination requires video records and the possibility of precise timing. At the same time, if we are interested in developmental changes, these video records have to span considerable periods. Furthermore, these developmental changes do occur within larger frames such as the particular course students are enrolled in, or even larger units including the out-of-school worlds. Here, then, anthropological studies drawing on ethnography, participant observation, or apprenticeship as method for constructing an understanding of culture and groupings. Most importantly, because engaging in an activity is different from talking about one's engagement in an activity, most of my data bases are constituted by large amounts of video data showing people in activity rather than by interviews about activity.[7]

Zooming and Observables

With a very narrow frame, I focused on an individual, his utterances and gestures over and about a computer-animated event. Such an analysis reveals the nature of the relationship embodied in the unit of agent-in-setting and the experience of being-in-the-world. Gesture, words, and world coproduce each other. What we recognize as cognition are coincident images of an iconic gesture and the shape of a trajectory create for the analyst spectator. Words and deictic gesture pick out or leave underdetermined particular ways of cutting the focal area into objects and events allowing the analyst to reconstruct what an individual's ontology might have been.
When the analytic frame is opened up, and several individuals are analyzed as a collectivity, new cognitive phenomena come into focus. Multiple beings engaged in constructing a common world, where their respective observation descriptions are recognized as being the same. Learning then also becomes a social phenomenon, and the question to be dealt with is what and how traces of the activity changes the individual agent-in-setting unit. Here, my analysis showed how students come to construct a common lifeworld. When their respective observation descriptions are viewed by each other as compatible, there appears to be what I have called interactive stabilization (Roth, 1996c; Roth & Duit, 1998).[8] Because of their common condition and the task to arrive at a collective response, students come to experience (perceive, act on, describe) the focal objects in ways that they recognize as shared. It is often in the conversation as a collective phenomenon that new "conceptions" are worked out before each individual seems to subscribe to it. Thus, in the episode discussed here, the three students collectively arrived at a description for situations where [force] is not acting on (disconnected from) the object. Only from that point on did each of the three individuals consistently refer to the object on a straight trajectory when [force] did not act on the circular object. They each had appropriated, from the publicly accessible conversational situation, a new way of talking about the phenomena at hand.
The episode featuring an interaction between students and myself (teacher) highlights two important elements. First, students' ontologies of objects and events may be significantly different from that of the scientist and differ even among each other. Glen and Ryan expected and then perceived the object as immediately going downward. Elizabeth perceived an upward motion that preceded the downward motion. Rather than interpreting such differences as a defect or a cognitive deficiency, I interpret it as a consequence of the interaction of present ways of organizing the world and the stimuli that arrive at the sensory surface of each individual. It is simply one form of patterned activity of an agent-in-setting. But even the orientation (attention) to the world is a function of the current state of the cognitive system (being-in-the-world). From the cognitive scientists' perspective, the issue then is to understand the kind of experiences that allow the cognitive system to change in particular directions (i.e., pursue a particular trajectory), and how these changes come about.
As part of the commitment to being-in-the-world, the setting itself becomes part of the analysis. In my final example the analysis again kept agent and setting in focus concurrently rather than letting one slip in favor of the other. This concurrent focus on human actors and computers and their interaction is embodied in the way Table 1 was constructed. The analyses of human-computer interaction also makes clear why I am little interested in analyzing what a computer can record as having occurred. What is available to the computer is only a small, important, but very partial slice that underdetermines what is salient in the world of the users. Again, I am interested in phenomena that have as analytic unit user-user-computer interactions: In users' interactions over and with the computer interface. On the other hand, the mapping from machine states (structures) to a priori assumptions of user intents (structures in mental activity), on which the success of certain interactions such as that in Table 1 depends, would lead to trouble (cf., Suchman, 1987).
The analytical unit does not need to be constrained to groups as I have done here for reasons of space limitations. Elsewhere, I described phenomena at more global levels than any of the examples provided here. In one study, we confirmed the hypothesis that a different physical placement of the same individuals in the same social configuration (whole class activity) leads to different forms of participation in discourse and even in the nature of the discourse contributions (Roth, McGinn, Woszczyna, & Boutonné, in press). In another, we documented the interaction of changes in classroom discourse with the development of group activities, and changes in the discourse of individual students (Roth & Duit, 1998). Learning therefore arose from phenomena at the levels of activity, individual, and classroom which mutually influenced each other.

Situating Situated Cognition

Theory, method, and phenomena cannot be separated. My methodology, to be useful, has to be sensitive to the nature of the phenomena in the theory. Thus, because the theory uses agent-in-setting as unit of analysis--that is, counts as the cognitive system individual, its lifeworld, and the patterned forms of activity in the transaction of the two--it makes little sense to look for cognitive phenomena (structure) independent of the setting. But in this, cognition appears to the observer as a situated phenomenon. That is, cognition is not just situated in the sense that the intelligence in activity arises from the agent-in-setting unit, but also in the investigator's commitment of situating cognition. Cognition is situated because investigators have made the choice of a particular unit of analysis which actively situates cognition. The converse is also true. Researchers who want to confine cognition to the gray matter will attempt to control all context and therefore will not be able to notice how patterns in the setting contribute to cognition under everyday circumstances. Furthermore, even in the most controlled contexts, researchers cannot separate people from the social, cultural, and historical contexts that led them to engage in particular discourse and other representation practices in the first place.
In traditional cognitive science, the external world, the stimuli to which research participants are exposed, are assumed is constant, an objectively-available world. Computer models kept track of objects by assigning them Cartesian coordinates and orientations that had to be tracked for every object and subject in the model world. My research begins with a different commitment and supposes that each individual agent acts in a different world, its lifeworld. From this perspective, it has to be shown how the stability and sameness of the worlds of individual people arises in the first place. At the most fundamental level, each newborn always and already comes into a world shot through with meaning. As children learn in (adapt to) this world, they acquire a basic set of "common sense," a basic way of cutting up the world into objects, events, and with basic observation categoricals, the roots of theory for how and why the world operates the way it does. As they are exposed to school activities and different subject matters, they learn to parse the world in new ways (i.e., they develop new ontologies). The activity of the researcher is to situate cognition; they (explicitly or implicitly) do by the world of cognition along some set of joints. For me, these joints are determined by what is salient to the intelligent agent-in-setting rather than the casing of the brain. Situating cognition is therefore the willingness to open up the analytic frame, from covering things that might be found between the ears and underneath the skull to the patterned and structured phenomenon of being-in-the-world. Some cognitive scientists have made quite explicit this redefinition of cognition, by choosing the cockpit of an airplane as the analytic unit rather than the pilots' minds (Hutchins, 1995) or by examining and modeling the lifeworld of a short-order cook (Agre & Horswill, 1997).

Open Questions

By now, a decade after Jean Lave's (1988) and Lucy Suchman's (1987) seminal publications that laid the ground work for expanding cognitive units of analysis, a number of investigations in educational settings have explored the usefulness of regarding cognition as situated. Too often, however, educators have tempted to provide microlevel descriptions without considering more overarching temporal and physical constraints on the activities. For example, we now need to ask, "How does agent-in-setting (being-in-the-world) change in the course of activity?" "Which aspects of the cognitive unit are transported to new settings?," and "What are the long-term effects of individual activities?" As researchers, we may approach these tasks by asking how much overlap we can observe when we conduct investigations of the type agenti-in-settingj for all sets (i, j) that are of (theoretical) interest. In the examples provided here, different students contributed to stabilizing particular observation sentences. It should also be of interest to find out answers to questions such as, "How are such co-constructed sentences eventually appropriated by individuals?," and "How do individuals arrive at using these observation sentences for their own intentions even in the absence of the other group members?"

References

Agre, P. E. (1995). Computational research on interaction and agency. Artificial Intelligence, 72, 1-52.
Agre, P. E. (1997). Computation and human experience. Cambridge: Cambridge University Press.
Agre, P., & Horswill, I. (1997). Lifeworld analysis. Journal of Artificial Intelligence Research, 6, 111-145.
Anderson, J. R. (1985). Cognitive psychology and its implications. San Francisco, CA: Freeman.
Ballard, D. H, Hayhoe, M. M., Pook, P. K., & Rao, R. P. N. (1997). Deictic codes for the embodiment of cognition. Behavioral and Brain Sciences, 20, 723-767.
Baumgartner, P., & Payr, S. (Eds.). (1995). Speaking minds: Interviews with twenty eminent cognitive scientists. Princeton, NJ: Princeton University Press.
Bourdieu, P. (1990). The logic of practice. Cambridge, UK: Polity Press.
Bourdieu, P. (1997). Méditations pascaliennes [Pascalian meditations]. Paris: Seuil.
Bourdieu, P., & Wacquant, L. J. D. (1992). An invitation to reflexive sociology. Chicago, IL: The University of Chicago Press.
Brooks, R. (1995). Intelligence without reason. In L. Steels & R. Brooks (Eds.), The artificial life route to artificial intelligence: Building embodied, situated agents (pp. 25-81). Hillsdale, NJ: Lawrence Erlbaum Associates.
Brown, J. S., & Duguid, P. (1992). Enacting design for the workplace. In P. S. Adler & T. A. Winograd (Eds.), Usability: Turning technologies into tools (pp. 164-197). New York: Oxford University Press.
Churchland, P. S., & Sejnowski, T. J. (1992). The computational brain. Cambridge, Mass: MIT.
Dreyfus, H. L. (1992). What computers still can't do: A critique of artificial reason. Cambridge, MA: MIT.
Edwards, D., & Potter, J. (1992). Discursive psychology. London: Sage.
Engeström, Y., Brown, K., Engeström, R., & Koistinen, K. (1990). Organizational forgetting: an activitiy-theoretical perspective. In D. Middleton & D. Edwards (Eds.), Collective remembering (pp. 139-168). London: Sage.
Garfinkel, H. (1991). Respecification: evidence for locally produced naturally accountable phenomena of order*, logic, reason, meaning, method, etc. in an as of the essential haecceity of immortal ordinary society, (I)--an announcement of studies. In G. Button (Ed.), Ethnomethodology and the human sciences (pp. 10-19). Cambridge: Cambridge University Press.
Gilbert, G. N., & Mulkay, M. (1984). Opening Pandora's box: A sociological analysis of scientists' discourse. Cambridge: Cambridge University Press.
Greeno, J. G. (1991). Number sense as situated knowing in a conceptual domain. Journal for Research in Mathematics Teaching, 22, 170-218.
Hayward, W. G., & Tarr, M. J. (1995). Spatial language and spatial representation. Cognition, 55, 39-84.
Hewitt, P. G. (1989). Conceptual physics, 6th ed. Glenview, IL: Scott, Foresman.
Hutchins, E. (1995a). Cognition in the wild. Cambridge, MA: The MIT Press.
Hutchins, E. (1995b). How a cockpit remembers its speeds. Cognitive Science, 19, 265-288.
Jarvilehto, T. (1998a). The theory of the organism-environment system: I. Description of the theory. Integrative Physiological and Behavioral Science, 33, 317-330.
Jarvilehto, T. (1998b). The theory of the organism-environment system: II. Significance of nervous activity in the organism-environment system. Integrative Physiological and Behavioral Science, 33, 331-338.
Johnson, M. (1987). The body in the mind: The bodily basis of imagination, reason, and meaning. Chicago: Chicago University Press.
Jordan, B., & Henderson, A. (1995). Interaction analysis: Foundations and practice. The Journal of the Learning Sciences, 4, 39-103.
Kirsh, D. (1995). The intelligent use of space. Artificial Intelligence, 73, 31-68.
Larkin, J. H., McDermott, J., Simon, D. P., & Simon, H. A. (1980). Expert and novice performance in solving physics problems. Science, 208, 1335-1342.
Lave, J. (1988). Cognition in practice: Mind, mathematics and culture in everyday life. Cambridge: Cambridge University Press.
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge: Cambridge University Press.
Mandelblit, N., & Zachar, O. (1998). The notion of dynamic unit: Conceptual developments in cognitive science. Cognitive Science, 22, 229-268.
Mareschal, D., & Shultz, T. R. (1996). Generative connectionist networks and constructivist cognitive development. Cognitive Development, 11, 571-603.
Masciotra, D., & Roth, W.-M. (1999, March). Beyond reflection-in-action: A case study of questioning in science teaching. Paper presented at the annual conference of the National Association for Research in Science Teaching, Boston, Mass.
McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago.
Metz, K. E. (1993). Preschoolers' developing knowledge of the pan balance: From new representation to transformed problem solving. Cognition and Instruction, 11, 31-93.
Orr, J. E. (1990). Sharing knowledge, celebrating identity: Community memory in a service culture. In D. Middleton & D. Edwards (Eds.), Collective remembering (pp. 169-189). London: Sage.
Prigogine, I. (1980). From being to becoming: Time and complexity in the physical sciences. San Francisco, CA: Freeman.
Quine, W. V. (1995). From stimulus to science. Cambridge, Mass: Harvard University Press.
Roth, W.-M. (1995a). Affordances of computers in teacher-student interactions: The case of Interactive Physics(TM). Journal of Research in Science Teaching, 32, 329-347.
Roth, W.-M. (1995b). Authentic school science: Knowing and learning in open-inquiry laboratories. Dordrecht, Netherlands: Kluwer Academic Publishing.
Roth, W.-M. (1996a). Art and artifact of children's designing: A situated cognition perspective. The Journal of the Learning Sciences, 5, 129-166.
Roth, W.-M. (1996b). Knowledge diffusion* in a Grade 4-5 classroom during a unit on civil engineering: An analysis of a classroom community in terms of its changing resources and practices. Cognition and Instruction, 14, 179-220.
Roth, W.-M. (1996c). The co-evolution of situated language and physics knowing. Journal of Science Education and Technology, 3, 171-191.
Roth, W.-M. (1996d). Thinking with hands, eyes, and signs: Multimodal science talk in a grade 6/7 unit on simple machines. Interactive Learning Environments, 4, 170-187.
Roth, W.-M. (1998a). Designing communities. Dordrecht, Netherlands: Kluwer Academic Publishing.
Roth, W.-M. (1998b). Starting small and with uncertainty: Toward a neurocomputational account of knowing and learning in science. International Journal of Science Education, 20, 1089-1105.
Roth, W.-M. (1998c). Situated cognition and assessment of competence in science. Evaluation and Program Planning, 21, 155-169.
Roth, W.-M. (1999, April). From iconic gesture to sign and discourse: embodiment as precursor to scientific knowledge. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Quebec.
Roth, W.-M. (in press). Discourse and agency in school science laboratories. Discourse Processes.
Roth, W.-M., & Duit, R. (1998). Talk as medium for development: Interactions of activity, individual conceptions, and community discourse. Cognitive Science.
Roth, W.-M., & Masciotra, D. (in press). Relationality as an alternative to reflectivity. Teachers and Teaching: Theory and Practice.
Roth, W.-M., Masciotra, D., & Boyd, N. (in press). Becoming-in-the-classroom: a case study of teacher development through coteaching. Teaching and Teacher Education.
Roth, W.-M., McGinn, M. K., Woszczyna, C., & Boutonné, S. (in press). Differential participation during science conversations: The interaction of display artifacts, social configuration, and physical arrangements. The Journal of the Learning Sciences.
Roth, W.-M., Woszczyna, C., & Smith, G. (1996). Affordances and constraints of computers in science education. Journal of Research in Science Teaching, 33, 995-1017.
Scribner, S. (1986). Thinking in action: some characteristics of practical thought. In R. J. Sternberg & R. K. Wagner (Eds.), Practical intelligence: Nature and origins of competence in the everyday world (pp. 13-30). Cambridge: Cambridge University Press.
Suchman, L. A. (1987). Plans and situated actions: The problem of human-machine communication. Cambridge: Cambridge University Press.
Suchman, L. A., & Trigg, R. H. (1993). Artificial intelligence as craftwork. In S. Chaiklin & J. Lave (Eds.), Understanding practice: Perspectives on activity and context (pp. 144-178). Cambridge: Cambridge University Press.
Tobin, K., Espinet, M., Byrd, S. E., & Adams, D. (1988). Alternative perspectives of effective science teaching. Science Education, 72, 433-451.
Varela, F. J., Thompson, E., & Rosch, E. (1993). The embodied mind: Cognitive science and human experience. Cambridge, MA: MIT.
Winograd, T. (Ed.). (1996). Bringing design to software. New York, NY: ACM Press.
Winograd, T., & Flores, F. (1987). Understanding computers and cognition: A new foundation for design. Norwood, NJ: Ablex.
 


[1] In the framework developed here, all structural aspects of human agency that we recognize that contribute to cognitive activity are located (i.e., situated) somewhere along the agent-in-setting continuum. Some structures are embodied more on the agent side of this continuum, others more on the setting side. Where the most salient and significant structures lie along the continuum is, to me, an empirical matter rather than one to be decided a priori.
[2] As Jonna Kulikowich pointed out to me, my own perspectives on what the world of this classroom loooks like change with the setting: my perspectives of what is happening in the situations reported below depends on the time scale considered and therefore differ for Roth the teacher in situation, the physicist, and the cognitive analyst of videotapes. (See also the paper by Kulikovich and Young, this issue.)
[3] Research in social and discursive psychology (e.g., Edwards & Potter, 1992), sociology (e.g., Bourdieu, 1990), and sociology of science (e.g., Gilbert & Mulkay, 1984) showed that individuals, when asked, may describe and explain their ontologies in ways inconsistent with their actions.
[4] Here, for ease of reading, I use [velocity] and [force] to denote the respective arrows and . However, especially in the excerpts presented here, students do not perceive these arrows as denoting "velocity" or "force" or any thereby reified natural phenomena.
[5] Descriptions and theorizing of the dynamic and emergent aspects in my teaching from the same agent-in-setting perspective can be found elsewhere (Masciotra & Roth, 1999; Roth & Masciotra, in press; Roth, Masciotra, & Boyd, in press).
[6] In most general terms, constraints limit the possibilities of actions. In some situations, this limitation brings with it an affordance: handles limit the places where one might try to touch an object to carry it, but also allow for an easier way to actually do the carrying task. In other situations, a constraint prevents people from doing what they intend or are supposed to be doing: many newcomers to Macintosh computers were afraid to eject their diskettes, which required to move the diskette icon over the trash can icon, because they thought they would loose the diskette contents (Winograd, 1996).
[7] Talk about activity is a different kind of activity, with a different focus, and different context and properties (Bourdieu, 1980). Thus, it is not surprising that researchers frequently find little overlap between teachers' actions in the classroom and their descriptions and explanations of these actions (e.g., Tobin, Espinet, Byrd, & Adams, 1988).
[8] The notion of interactive stabilization has particular appeal because it is consistent with my computer models of interpretation formation. Here, the dynamic of a group with respect to the individual interpretations is modeled as constraint satisfaction among multiple interacting hypotheses in connectionist networks.