Activity
Learning by Demonstration
Abhishek Saxena
Contents
:
Activity Learning by Demonstration(LbD) from a human robot interaction perspective has been receiving increased attention as a method to teach robots everyday activities and hence perform task transfer so that the robot eventually learns to do the activity/task itself .This project aims to explore LbD with the aim to be able to provide insights into some idiosyncrasies of human behavior that need to be tackled for designing better LbD systems and thus to suggest some design choices.The experiment considers a computer program/robot with limited vision capabilities so that the former can only detect movement of specific objects on a tabletop
The software System employed for running the experiments is hosted on a single computer attached with a camera.The underlying application employs multiple techniques such as Hidden Markov Models and higher –level inference to learn activities demonstrated to it.The HMMs have been trained to recognize simple human actions that in turn compose the activities to be taught; there being a single HMM trained for every elementary action.Few typical examples of such actions could include a push , pull, lift or hide as applied to an object.
The system first records the demonstrated activity as a set of observations and then divides this observation set into chunks , each chunk is to be fed to each of the HMMs so that the action corresponding HMM with the highest response is taken to be the occurrence; this thus constitutes a sequence of elementary actions which corresponds to a complete compound activity.The user then explicitly provides a label to the activity and hence the activity is learned and recorded in the knowledge base
The experimental setup is as depicted in the following snapshot :
The experiment deals with tabletop activities .For tackling the object recognition problem, fiducial markers – rectangular figures recognizable by specifically designed vision algorithms- have been used with a single marker attached to each object.It may thus be noted that ‘learning an activity’ from the system’s perspective essentially comprises of learning the patterns in which these objects are moved around on the tabletop and that’s what the system later tries to consult when tested for being able to use the learned knowledge to recognize an activity
The following 3 activities were chosen for the experiment which are variants of how a human would typically prepare breakfast:
1. Preparing Corn Flakes + Coffee
2. Preparing Bread slices + Corn
Flakes
3. Preparing Bread slices + Coffee
The central idea being that given the difference in the aim of the individual activities , the patterns in which the objects are moved should change and hence serve as the basis of being able to differentiate.
The experiment was done with 10 research subjects one by one in three phases for each of the three activities above respectively
It ought to be noted that asking the users to perform the same sequence of steps does not necessarily lead to exactly the same flow of steps as users tend to focus on certain parts of an activity more than the others and naturally do not recall every single detail of an activity.Ofcourse explicitly asking the users to perform the activity naturally results in a lot more ambiguity for the system.
The results included two kinds of
recognition results based on two separate knowledge bases , the first
one comprising of the activities solely taught by a particular
user(individualized result) and the
second one accumulated over time as users trained the system(cumulative result)
The system was pre-trained on some activities tagged as ‘null activities’ which essentially implied that the user isn’t doing anything useful
The results obtained as % recognition accuracy on the individualized training are given below :
Activity |
Phase
I |
Phase
II |
Corn
Flakes+Coffee |
100 % |
70 % |
Corn
Flakes+Bread |
60 % |
20 % |
Coffee+Bread |
50 % |
30 % |
As the above table illustrates, performing the activity ‘naturally’ in phase II, leads to a drastic decrease in the recognition accuracy, though it is inevitable that some kind of bias is introduced from phase I.The 70% reading for the first activity occurs due to the 3 cases when the system confuses Activity 1 with ‘null activity’.However in Phase I since the activity is controlled , the system can distinguish that ‘something is being done’ and since the output corresponds to the best match found in the knowledge base, the system gets it right.
The results obtained for the cumulative case were plotted as a graph of the net accuracy obtained at each time step corresponding to each user, 1 corresponds to 100%
Observations and
Conclusions :
The following observations were made during
and after the experiments :
What the
above observation implies is as follows – lets consider any typical human
activity , the same in a setup such as the one used for this experiment or
almost any other configuration is essentially constituted by two parts , one of
them being the main part of the core of the activity which involves the core
set of actions that define the activity and the other one being the
supplementary part that involves actions that although might be necessary
,don’t contribute to the activity’s definition.The observation of
the subjects’ activity performances reveals that they focus much
more(consciously or subconsciously) on the ‘core’ and tend to be
more careful/aware of their interaction with the system when performing the
core.
It’s
apparent from the experimental observations that there exists a large variation
in which individuals perform an activity and hence in the patterns in which the
objects concerned are moved on the tabletop which in turn depends on an
individuals perceived affordances of the workspace and idiosyncrasies of
his/her activity performance .Although, it might work to some extent in shorter
activities, the approach of pure
tracking based learning/recognition is bound to run into problems as the
activities get more complex since the same increases the no. of possible
patterns in which the objects can be moved to do the same thing.It’d thus be advisable for an
activity LbD system to try to extract semantic information from the learned
activity for eg. to try to learn causal rules from demonstrations.A
possible design choice in this case could be to distinguish between what’s
happening in the workspace in terms of actions and ‘states’ .Thus a
simple causal rule learning could correspond to learning a mapping of actions
to changes in world states.Alternatively if some pre-existing knowledge of any
of the objects in present in the workspace is available , it could be exploited
to understand more about what’s going on.
It has been established from the
experimental data for the ‘cumulative learning ’ case that a few of the learned entries
in the knowledge base are
responsible for most of the recognition errors( the pareto’s 80/20
principle ).A solution for this problem is that of adopting credit assignment
strategies.Credit assignment implies being able to locate the point of fault in
the learned knowledge once getting to know that something is wrong.
The
experimental observations illustrate to some extent that the problems inherent
in LbD are related.Usually people find it difficult to recall the exact
sequence of steps that they performed for teaching the system and hence even
asking them to repeat the steps does not lead to the same sequence being
repeated. However to understand which steps can be interchanged requires lots
of statistics aggregated/generalized in the form of knowledge.A knowledge based
approach to learning seems to be viable
in the sense that most observations suggest a design that allows the
system to bias further learning using the knowledge gathered earlier.