New Relic for Monitoring Amazon Kinesis Workers

We use Amazon Kinesis (Queue Service) and read the queue of readers in Java. They mainly read from the queue and insert data into our data warehouse. I was wondering if anyone managed to succeed in using the new relic to monitor background queue employees?

Some analysts that interest me:

  • How many queue workers are working right now? (they scale up and down depending on the load)
  • How many messages / seconds does each queue desktop process? What does it look like over time?
  • How many messages / seconds does the entire fleet occupy?
  • Workers turn to MySQL and Cassandra. What part of their time is spent on this?
  • We log in with log4j. If workers generate errors / traces, what are they? What is the error rate over time?

Thanks,

Advait

+5
source share
1 answer

The new relic has no problems with monitoring batch jobs, and not with web transactions, so this will not be a problem.

Assuming you start with a Java application for which you have source code, the best way forward is to use the agent API: https://docs.newrelic.com/docs/agents/java-agent/custom-instrumentation/ java-agent-api . This leaves you with a good place to report any metrics you like, even if we donโ€™t record them automatically. I will answer your questions 1 to 1:

1) We have several ways to cut this pie, but the simplest one that I can think of is to make a request to NewRelic.recordMetric ("User / Queue_worker / alive", 1). I would just have a timer to make this call once a minute (since our metric collection cycle) for each employee. Then in the user panel ( https://docs.newrelic.com/docs/apm/dashboards-menu/custom-dashboards ) you can ignore the values โ€‹โ€‹of the indicators (which will be averaged), so if you donโ€™t have a wizard that knows the "value and can just report it as often as you want, you wonโ€™t get the desired effect by reporting 1 + 1 + 1 ... = 1). You will display the call_count field to see how many workers have started this minute.

2) In this case, you would like to use the same template as above, except for creating another custom metric for each employee. Fortunately, custom panels help to cope with heavy lifting here - something like NewRelic.recordMetric ("Custom / Queue_worker / y / number_of_messages", x) for x = number of messages processed, y = some unique identifier (random GUID?) per employee ... in a minute - and then you can simply draw Custom / Queue_worker / * / number_of_messages so that they are all laid out on the same chart.

3), each employee represents the same user metric, Custom / queue_worker / message_sent and count count count on this metric. Once again, you canโ€™t just tell the value for each employee, since the subsequent metric data will be averaged together, but we will keep a good score for you.

4) you will get free MySQL time (if you use the mysql or JDBC connector specified here: https://docs.newrelic.com/docs/agents/java-agent/getting-started/new-relic-java#h2-compatibility ) - this will appear as a "database" in your charts and transactional traces. For Cassandra, we donโ€™t have special tools, but you can use the agent API again (NewRelic.recordResponseTimeMetric () is recommended), at least write this down and calculate it separately.

5) You get an error rate for free if your errors consist of unhandled exceptions - or you can make an API call at any time when you throw an exception (or any error condition that you want to flag) in NewRelic.noticeError (). In addition, if errors occur as unhandled exceptions (neat trick: handle the exception in your code and then check it so that our agent sees it with context), you will get a stack trace and any metadata about the transaction recorded using NewRelic.addCustomParameter ( ) We do not process the log file, although you can write a very small program for processing and importing indicators using the above methods, and since we license for each host and not for the agent, you can run it on an already licensed one without any additional costs.

It is much easier to use Insights here ( https://docs.newrelic.com/docs/insights/new-relic-insights ) - for example, you can access the list of running agents over time without additional work, and you can report numbers, which will not be averaged over which you can do the math and graph them. But this is a separate product, and I'm not trying to disperse you :)

Note: I am working on New Relic.

+2
source

Source: https://habr.com/ru/post/1204454/


All Articles