The type called by cv::Mat::at , which must match the type of individual pixels. Since cv::Scalar is basically cv::Vec<double,4> , this will not work for the U8C3 image (this, of course, will work for the F64C4 image).
In your case, you need cv::Vec3b , which is a typedef for cv::Vec<uchar,3> :
Vec3b col = I.at<Vec3b>(i, j);
You can then convert this to cv::Scalar if you really need to, but the instance type cv::Mat::at should match the type of your image, as it just transfers the image data without any conversion.
The second code fragment returns a pointer to the i-th line of the image. This is not unrelated data, but simply a pointer to single uchar values. Therefore, in the case of the U8C3 image U8C3 each consecutive 3 elements in the data returned in p must represent one pixel. Again, so that each pixel is used as one element, use
Vec3b *p = I.ptr<Vec3b>(i);
which again does nothing more than cast the string pointer before returning it.
EDIT: If you want to make a lot of calls to the images in the image, you can also use the convenient type cv::Mat_ . This is nothing more than a typed thin wrapper around image data, so all calls to image pixels are printed accordingly:
Mat_<Vec3b> &U = reinterpret_cast<Mat_<Vec3b>&>(I);
Then you can freely use U(i, j) and always get 3 characters of unsigned characters and, therefore, pixels, again without copying, just type roles (and, therefore, at the same performance as I.at<Vec3b>(i, j) ).