Activity Learning by Demonstration

Abhishek Saxena

Contents :

Activity Learning by Demonstration(LbD) from a human robot interaction perspective has been receiving increased attention as a method to teach robots everyday activities and hence perform task transfer so that the robot eventually learns to do the activity/task itself .This project aims to explore LbD with the aim to be able to provide insights into some idiosyncrasies of human behavior that need to be tackled for designing better LbD systems and thus to suggest some design choices.The experiment considers a computer program/robot with limited vision capabilities so that the former can only detect movement of specific objects on a tabletop

The System :

The software System employed for running the experiments is hosted on a single computer attached with a camera.The underlying application employs multiple techniques such as Hidden Markov Models and higher –level inference to learn activities demonstrated to it.The HMMs have been trained to recognize simple human actions that in turn compose the activities to be taught; there being a single HMM trained for every elementary action.Few typical examples of such actions could include a push , pull, lift or hide as applied to an object.

The system first records the demonstrated activity as a set of observations and then divides this observation set into chunks , each chunk is to be fed to each of the HMMs so that the action corresponding HMM with the highest response is taken to be the occurrence; this thus constitutes a sequence of elementary actions which corresponds to a complete compound activity.The user then explicitly provides a label to the activity and hence the activity is learned and recorded in the knowledge base

The Experiment :

The experimental setup is as depicted in the following snapshot :

The experiment deals with tabletop activities .For tackling the object recognition problem, fiducial markers – rectangular figures recognizable by specifically designed vision algorithms- have been used with a single marker attached to each object.It may thus be noted that ‘learning an activity’ from the system’s perspective essentially comprises of learning the patterns in which these objects are moved around on the tabletop and that’s what the system later tries to consult when tested for being able to use the learned knowledge to recognize an activity

The following 3 activities were chosen for the experiment which are variants of how a human would typically prepare breakfast:

1. Preparing Corn Flakes + Coffee

2. Preparing Bread slices + Corn Flakes

3. Preparing Bread slices + Coffee

The central idea being that given the difference in the aim of the individual activities , the patterns in which the objects are moved should change and hence serve as the basis of being able to differentiate.

The experiment was done with 10 research subjects one by one in three phases for each of the three activities above respectively

Demonstration of the activity to the system
Performance of the activity again using the same sequence of steps and testing if the system can recognize them
Performance of the activity again in a more relaxed manner and again testing for recognition

It ought to be noted that asking the users to perform the same sequence of steps does not necessarily lead to exactly the same flow of steps as users tend to focus on certain parts of an activity more than the others and naturally do not recall every single detail of an activity.Ofcourse explicitly asking the users to perform the activity naturally results in a lot more ambiguity for the system.

The results included two kinds of recognition results based on two separate knowledge bases , the first one comprising of the activities solely taught by a particular user(individualized result) and the second one accumulated over time as users trained the system(cumulative result)

The system was pre-trained on some activities tagged as ‘null activities’ which essentially implied that the user isn’t doing anything useful

The results obtained as % recognition accuracy on the individualized training are given below :

Activity	Phase I	Phase II
Corn Flakes+Coffee	100 %	70 %
Corn Flakes+Bread	60 %	20 %
Coffee+Bread	50 %	30 %

As the above table illustrates, performing the activity ‘naturally’ in phase II, leads to a drastic decrease in the recognition accuracy, though it is inevitable that some kind of bias is introduced from phase I.The 70% reading for the first activity occurs due to the 3 cases when the system confuses Activity 1 with ‘null activity’.However in Phase I since the activity is controlled , the system can distinguish that ‘something is being done’ and since the output corresponds to the best match found in the knowledge base, the system gets it right.

The results obtained for the cumulative case were plotted as a graph of the net accuracy obtained at each time step corresponding to each user, 1 corresponds to 100%

Observations and Conclusions :

The following observations were made during and after the experiments :

Humans tend to focus on the ‘activity core’

What the above observation implies is as follows – lets consider any typical human activity , the same in a setup such as the one used for this experiment or almost any other configuration is essentially constituted by two parts , one of them being the main part of the core of the activity which involves the core set of actions that define the activity and the other one being the supplementary part that involves actions that although might be necessary ,don’t contribute to the activity’s definition.The observation of the subjects’ activity performances reveals that they focus much more(consciously or subconsciously) on the ‘core’ and tend to be more careful/aware of their interaction with the system when performing the core.

Human recall and performance of activities is based on ‘activity semantics’ and hence a robot can gain a lot from some knowledge based interpretation of activity semantics

It’s apparent from the experimental observations that there exists a large variation in which individuals perform an activity and hence in the patterns in which the objects concerned are moved on the tabletop which in turn depends on an individuals perceived affordances of the workspace and idiosyncrasies of his/her activity performance .Although, it might work to some extent in shorter activities, the approach of pure tracking based learning/recognition is bound to run into problems as the activities get more complex since the same increases the no. of possible patterns in which the objects can be moved to do the same thing.It’d thus be advisable for an activity LbD system to try to extract semantic information from the learned activity for eg. to try to learn causal rules from demonstrations.A possible design choice in this case could be to distinguish between what’s happening in the workspace in terms of actions and ‘states’ .Thus a simple causal rule learning could correspond to learning a mapping of actions to changes in world states.Alternatively if some pre-existing knowledge of any of the objects in present in the workspace is available , it could be exploited to understand more about what’s going on.

Credit Assignment becomes crucial for a system when trying to gather large amount of knowledge or learning from multiple sources

It has been established from the experimental data for the ‘cumulative learning ’ case that a few of the learned entries in the knowledge base are responsible for most of the recognition errors( the pareto’s 80/20 principle ).A solution for this problem is that of adopting credit assignment strategies.Credit assignment implies being able to locate the point of fault in the learned knowledge once getting to know that something is wrong.

Generalization and Action ordering

The experimental observations illustrate to some extent that the problems inherent in LbD are related.Usually people find it difficult to recall the exact sequence of steps that they performed for teaching the system and hence even asking them to repeat the steps does not lead to the same sequence being repeated. However to understand which steps can be interchanged requires lots of statistics aggregated/generalized in the form of knowledge.A knowledge based approach to learning seems to be viable in the sense that most observations suggest a design that allows the system to bias further learning using the knowledge gathered earlier.