for_each (begin(a), end(a), [] (int x) { cout<<x<<" ";});
begin(a) gives int(*)[3] (a pointer to an array of size [3]), and dereferencing it gives int(&)[3] , while your lambda expression expects an int argument.
for_each (begin(a[0]), end(a[2]), [] (int x) { cout<<x<<" ";});
begin(a[0]) gives an int* that points to the first element in the first line of a , and end(a[2]) gives an int* that points to one past of the last element in the last line of a , so everything works.
Now for the range-based for part.
If you remove & from the for (auto& row : a) , the error actually occurs in the next for(auto x : row) . This is due to the way the for range is specified. The section relevant to your use case,
If __range is an array, then begin_expr is __range and end_expr is (__range + __bound) , where __bound is the number of elements in the array (if the array is of unknown size or has an incomplete type, the program is poorly formed)
Here I will refer to the identifiers mentioned in the Explanation section on the linked page.
Consider the case of for (auto& row : a) :
__range is output as int(&)[3][3] (reference to an array of size [3] [3]). __begin then output as int(*)[3] (a pointer to an array of size [3]), because the __range type decays to a pointer to the first line of the 2D array. The range_expression expression has auto& row , so row is displayed as int(&)[3] (reference to an array of size [3]).
Further, the same process is repeated for the inner for range. In this case, __range is int(&)[3] , and the kernel suggestion given above applies; the residual type inference process is similar to that described above.
__range = int(&)[3] __begin = int* x = int
Now consider the case of for (auto row : a) :
__range , __begin and __end all output the same way. The most important difference in this case is the expression range_expression auto row , which causes the decay of type int(*)[3] , which was calculated by __begin as. This means that row is output as int * , and none of the three sentences that describe the definition of begin_expr / end_expr processes the raw pointer. This results in a compilation error inside the nested for loop.