an open source/in-house application performance management server that monitors your services and allows you to perform historical/real-time analysis for your endpoints
First make sure to init schema in both cassendra and elastic search
./elastic_init_index.sh
in cqlsh
create keyspace simple_apm with replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
use simple_apm;
create table request_info (service_name text, url text, method text, status smallint, response_time int, created_at timestamp, primary key (service_name, created_at, method, url));
To run server
# run one instance of sever to handle http request and push jobs in queues
make run_server
To run worker
# you need two types of workers online at least
make run_worker job_type=ES_SYNC target_queue=sync_es batch_size=5 # elastic search sync worker
make run_worker job_type=DB_WRITE target_queue=cassendra_write batch_size=5 # cassendra DB write worker
params:
batch_size
number of jobs to handle by worker per one exectarget_queue
the job queue worker will listen tojob_type
the type of job the spawned worker will handle
List of envs needs to be setup before starting the service
PRODUCTION_QUEUES
comma separated list of queues to push jobs into from server/producerREDIS_CONNECTION_URL
connection url to setup redis clientJWT_SECRET
secret used to sign and verify tokens between sdk/serverCASSENDRA_KEY_SPACE
name of the keyspace the tables are storedCASSENDRA_HOSTS
number of cassendra nodes that are under simple-apmSERVER_PORT
port used by http serverES_BULK_INSERTION_CONCURRENCY
max concurrency level use in bulk es index updates
- JS: <github.com/Kareem-Emad/simple-apm-express>
You can build your own custom sdk in whatever lang/way you want, just make sure you
- sign a token by the same secret set here in the envs
- use the same request format
curl --location --request POST 'http://localhost:5000/requests' \
--header 'Authorization: Bearer jwt_token_placeholder' \
--header 'Content-Type: application/json' \
--data-raw '{
"url": "https://google.com",
"http_method": "GET",
"response_time": 3000,
"service_name": "service_name",
"status_code": 200,
"created_at": "2020-04-02 02:10:01"
}'
- make sure you use the same date format as Elastic search is optimzied for this layout specificly as shown below
The database contains one table called request_info
, containing Fields:
Field | Data Type | Description |
---|---|---|
service_name |
text/varchar |
name of the service that contain this endpoint |
url |
text/varchar |
full url of the route in this service |
method |
text/varchar |
type of the request handled by the endpoint |
status |
small_int |
code returned to the requestor |
response_time |
int |
time taken from request recieval in the server to responding to the client |
created_at |
timestamp |
the timestamp when this request was done |
We have two types of indexes:
partition index
set toservice_name
columncluster_index
set to the (created_at
,method
,url
) in the order respectively
Note that accessing the data in cassendra should be optimized to use the index in its order to avoid latencies
So it's better to always start by specifying service_name
, range of dates created_at
, http method method
, and finally the endpoints you want to include in the query result url
.
Field | Data Type |
---|---|
service_name |
keyword |
url |
text |
http_method |
keyword |
status_code |
short |
response_time |
long |
created_at |
date in format yyyy-MM-dd HH:mm:ss |
To access data through elastic search server, there are already some ready to use queries to
fetch important data like throughput
, average_response_time
, min/max_response_time
, x_percentile
To calculate throughput, we need to use histograms in elastic search.
Following on the assumption that throughput
is the sucessfull number of requests handled per miute, we can achieve it with a query/aggregate like this one
curl -X GET "localhost:9200/request_info/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [ {"match": {"service_name": "YOUR_SERVICE_NAME"}} , {"match":{"url": "SOME_URL"}}, {"match":{"http_method": "GET"}}],
"filter": {
"range": {
"created_at": {
"gte": "START_DATE IN FORMAT (2020-04-01 02:01:01)",
"lte": "END_DATE IN FORMAT (2020-04-10 02:01:01)"
}
}
}
}
}, "aggs": {
"throughput_over_time": {
"date_histogram": {
"field": "created_at",
"fixed_interval": "1m"
}
}
}
}
'
Note that you can omit some of the matche queries in the must
to get calculate the througput curve over your whole service or a specific set of endpoints
Also notice that the interval here is tunable through fixed_interval
field, meaning you can calculate the throughput of total successfull requests per minute/hour/day/month/etc
To get the full stats for a certain endpoint(s)/service, we can use this query/aggregate:
curl -X GET "localhost:9200/request_info/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [ {"match": {"service_name": "YOUR_SERVICE_NAME"}} , {"match":{"url": "SOME_URL"}}, {"match":{"http_method": "POST"}}],
"filter": {
"range": {
"created_at": {
"gte": "START_DATE",
"lte": "END_DATE"
}
}
}
}
}, "aggs": {
"request_percentiles": {
"percentiles": {
"field": "response_time"
}
},
"request_stats":{
"stats":{
"field": "response_time"
}
}
}
}
'
Note that you need to specify the period of your search in both curls in the range
filter to get your stats within a certain timeline (last week/month/...)