i want to monitor the traffic outside my house on tenth street.
Indeed what is the extent to which low level information can be used and assimulated into a higher chunking level framework to gleem knowledge about the traffic? Yes, domain specific knowledge will be needed to process this low level information. However, the question I am wondering is: "what is the extent to which simple low level information can be used to capture understanding of the scene. And what techniques are the most efficient when compared to others." Granted using this resulting low level induced knowledge will create other knowledge ever expanding over time, a la data-mining, but the range of this will still be limited by what is immediately gleemed.
hopefully i can figure out what type of traffic there is, say, trucks or cars. and then the quantity of such at certain times of day.
One more difficulty I determined that is needed to be resolved before the above can be accomplished is a simple one - in concept: I need to either track these vehicles, or i need to resolve, when they cross a certain location, what type of vehicle they are and what direction they are headed. The latter is a loaded question because of the placement of my camera - off to the side: so there is effectively occulation and ambiguity in differentiating the vehicles as they pass.
i believe image histories will be best. i may upgrade the mechanism i use if my analysis tasks change.
Real time analysis "for each frame" is important, i feel, to gathering the most about this scene, ie. every frame must be analyzed, not say every other. Granted this assumption rests upon the claim that tracking will be the requirement for maximal knowledge creation.
Accomplished: though the WinCastTv board didn't work. I've been using the IXMicro's TurboTv board. They both use the BTTV848 chip though the IXMicro one uses the BTTV848A version. This may explain why it does work on my box, while the WinCast/TV one did not.
capture is being done in real time on my linux box using xwindows. Even though my xwindows system is using a MDA with Hercules extentions video card.
I tried several types: image differencing, histories of image differencing, accumulated image differencing with respect to a past set frame, image trails. I ended up deciding that a history of image differencing AND accumulation with respect to the past three frames provided the best information: not only did the cars appear solid but there was little bleeding between cars going in the same direction - ie. the car behind did not bleed into the car ahead: they could be differentiated.
That was an naive comment! More needs to be done before analysis is even
started:
Indeed image differencing just doesn't hack it. Since the surface of
the road is planar, and, last but not least, the verticality of the vehicles
is planar, the use of an affine transformation would be a good addition.
So my code at the moment transforms the road using an affine transformation.
Later image understanding analysis will be done on this result.
An interest of mine is identify the type of vehicles that pass, not only that they are a car or truck - though trucks may have a hitch or may not. Though differentiating the latter two may require more analysis logic, how it is to be accomplished would be interesting - with what techniques it might be combined. Indeed, some may require a different construct: say identifying police cars. They may need to be analyzed though a "color-model". Would just a model of the colors do, or would the model also need to encompass more physical information about such a car, such as the shape.