CloudWatch logs act weird

I have two log files with multi-line log reports. Both of them have the same date and time format at the beginning of each log statement. The configuration looks like this:

state_file = /var/lib/awslogs/agent-state [/opt/logdir/log1.0] datetime_format = %Y-%m-%d %H:%M:%S file = /opt/logdir/log1.0 log_stream_name = /opt/logdir/logs/log1.0 initial_position = start_of_file multi_line_start_pattern = {datetime_format} log_group_name = my.log.group [/opt/logdir/log2-console.log] datetime_format = %Y-%m-%d %H:%M:%S file = /opt/logdir/log2-console.log log_stream_name = /opt/logdir/log2-console.log initial_position = start_of_file multi_line_start_pattern = {datetime_format} log_group_name = my.log.group 

The cloudwatch log agent sends log1.0 logs correctly to my log group in cloudwatch, but does not send log files for log2-console.log.

awslogs.log says:

 2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196444000, 'start_position': 42330916L, 'end_position': 42331504L}, reason: timestamp is more than 2 hours in future. 2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196451000, 'start_position': 42331504L, 'end_position': 42332092L}, reason: timestamp is more than 2 hours in future. 

Although the server time is correct. It is also strange that line numbers are mentioned in start_position, and end_position does not exist in the current log file that was clicked.

Anyone else having this problem?

+6
source share
3 answers

I was able to fix it.

The status of awslogs has been broken. The state is stored in the sqlite database in / var / awslogs / state / agent -state. You can access it through

 sudo sqlite3 /var/awslogs/state/agent-state 

sudo is required to access the record.

List all threads with

 select * from stream_state; 

Take a look at your log stream and look at source_id , which is part of the json data structure in column v.

Then list all the entries with this source code (in my case it was 7675f84405fcb8fe5b6bb14eaa0c4bfd) in the push_state table

 select * from push_state where k="7675f84405fcb8fe5b6bb14eaa0c4bfd"; 

The resulting record has a json data structure in column v that contains batch_timestamp. And that batch_timestamp stitches to be wrong. This was in the past, and ever newer (more than 2 hours) journal entries were no longer processed.

The solution is to update this entry. Copy column v, replace batch_timestamp with the current timestamp and update something like

 update push_state set v='... insert new value here ...' where k='7675f84405fcb8fe5b6bb14eaa0c4bfd'; 

Restart the service using

 sudo /etc/init.d/awslogs restart 

I hope this works for you!

+8
source

We had the same problem, and the following steps fixed the problem.

If the log groups are not updated with the latest events: Complete these steps:

  • Awslogs service stopped
  • Remote file / var / awslogs / state / agent -state
  • Updated /var/awslogs/etc/awslogs.conf from hostaname to Ex instance id:

     log_stream_name = {hostname} to log_stream_name = {instance_id} 
  • Launched the awslogs service.
0
source

I managed to solve this problem on Amazon Linux:

  • sudo yum reinstalls awslogs
  • sudo service awslogs restart

This method saved my configuration files to / var / awslogs /, although you can back up before reinstalling.

Note. In my troubleshooting, I also uninstalled my Log Group through the AWS console. Rebooting completely reloaded all historical logs, but at the current timestamp, which is of less importance. I'm not sure if deleting a log group was necessary for this method to work. You might want to take a look at setting the initial_position config to end_of_file before rebooting.

0
source

Source: https://habr.com/ru/post/1012332/


All Articles