Knowledge is Yours - Nguyen Si Nhan: 2022

Wednesday, December 28, 2022

PRINCIPLES OF CHAOS ENGINEERING

Last Update: 2019 March (changes)

Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.

Advances in large-scale, distributed software systems are changing the game for software engineering. As an industry, we are quick to adopt practices that increase flexibility of development and velocity of deployment. An urgent question follows on the heels of these benefits: How much confidence we can have in the complex systems that we put into production?

Even when all of the individual services in a distributed system are functioning properly, the interactions between those services can cause unpredictable outcomes. Unpredictable outcomes, compounded by rare but disruptive real-world events that affect production environments, make these distributed systems inherently chaotic.

We need to identify weaknesses before they manifest in system-wide, aberrant behaviors. Systemic weaknesses could take the form of: improper fallback settings when a service is unavailable; retry storms from improperly tuned timeouts; outages when a downstream dependency receives too much traffic; cascading failures when a single point of failure crashes; etc. We must address the most significant weaknesses proactively, before they affect our customers in production. We need a way to manage the chaos inherent in these systems, take advantage of increasing flexibility and velocity, and have confidence in our production deployments despite the complexity that they represent.

An empirical, systems-based approach addresses the chaos in distributed systems at scale and builds confidence in the ability of those systems to withstand realistic conditions. We learn about the behavior of a distributed system by observing it during a controlled experiment. We call this Chaos Engineering.

CHAOS IN PRACTICE

To specifically address the uncertainty of distributed systems at scale, Chaos Engineering can be thought of as the facilitation of experiments to uncover systemic weaknesses. These experiments follow four steps:

Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior.
Hypothesize that this steady state will continue in both the control group and the experimental group.
Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.

The harder it is to disrupt the steady state, the more confidence we have in the behavior of the system. If a weakness is uncovered, we now have a target for improvement before that behavior manifests in the system at large.

ADVANCED PRINCIPLES

The following principles describe an ideal application of Chaos Engineering, applied to the processes of experimentation described above. The degree to which these principles are pursued strongly correlates to the confidence we can have in a distributed system at scale.

Build a Hypothesis around Steady State Behavior

Focus on the measurable output of a system, rather than internal attributes of the system. Measurements of that output over a short period of time constitute a proxy for the system’s steady state. The overall system’s throughput, error rates, latency percentiles, etc. could all be metrics of interest representing steady state behavior. By focusing on systemic behavior patterns during experiments, Chaos verifies that the system does work, rather than trying to validate how it works.

Vary Real-world Events

Chaos variables reflect real-world events. Prioritize events either by potential impact or estimated frequency. Consider events that correspond to hardware failures like servers dying, software failures like malformed responses, and non-failure events like a spike in traffic or a scaling event. Any event capable of disrupting steady state is a potential variable in a Chaos experiment.

Run Experiments in Production

Systems behave differently depending on environment and traffic patterns. Since the behavior of utilization can change at any time, sampling real traffic is the only way to reliably capture the request path. To guarantee both authenticity of the way in which the system is exercised and relevance to the current deployed system, Chaos strongly prefers to experiment directly on production traffic.

Automate Experiments to Run Continuously

Running experiments manually is labor-intensive and ultimately unsustainable. Automate experiments and run them continuously. Chaos Engineering builds automation into the system to drive both orchestration and analysis.

Minimize Blast Radius

Experimenting in production has the potential to cause unnecessary customer pain. While there must be an allowance for some short-term negative impact, it is the responsibility and obligation of the Chaos Engineer to ensure the fallout from experiments are minimized and contained.

Chaos Engineering is a powerful practice that is already changing how software is designed and engineered at some of the largest-scale operations in the world. Where other practices address velocity and flexibility, Chaos specifically tackles systemic uncertainty in these distributed systems. The Principles of Chaos provide confidence to innovate quickly at massive scales and give customers the high quality experiences they deserve.

Join the ongoing discussion of the Principles of Chaos and their application in the Chaos Community.

Source: https://principlesofchaos.org/

Tuesday, December 27, 2022

[ k8s ] frequently used commands in kubenetes

- cmd to show nodes taint

kubectl get nodes -o json | jq '.items[].spec.taints'

- remove or untaint a node

kubectl taint nodes node1 key1=value1:NoSchedule-

- cordon all node

kubectl get nodes | awk '{if (NR!=1) {print $1}}' | xargs -I {} kubectl cordon {}
- Kubectl autocomplete : 

source <(kubectl completion bash) # set up autocomplete in bash into the current shell, bash-completion package should be installed first.
echo "source <(kubectl completion bash)" >> ~/.bashrc # add autocomplete permanently to your bash shell.
alias k=kubectl
complete -o default -F __start_kubectl k
- Get all pods in a node : 
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=<nodename>

Thursday, December 22, 2022

[spinnaker export pipeline template] TypeError: Cannot read property 'value' of undefined.

export pipeline template with error :

TypeError: Cannot read property 'value' of undefined.

You can fixed it by run hal config as below :

hal config features edit --pipeline-templates true
hal config features edit --managed-pipeline-templates-v2-ui true

then apply them by :

hal deploy apply .

Done!

Friday, November 18, 2022

Create shortlink with custom domain with firebase dynamic link by curl

curl 'https://firebasedynamiclinks.googleapis.com/v1/shortLinks?key=xxxx' --header 'Content-Type: application/json' --data '

{

"dynamicLinkInfo": {

"domainUriPrefix":"https://yourdomain",

"link":"https://yourlinkyouwant_to_redirectto",

"analyticsInfo": {

"googlePlayAnalytics": {

"utmSource": "nhannguyen",

"utmMedium": "test",

"utmCampaign": "smartpos"

}

"suffix": {

"option": "SHORT"

}

Monday, November 7, 2022

Solved: [ kubernetes ] client intended to send too large body

To set client_max_body_size in ingress-nginx-controller you must add more this line to yaml file like that:

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

annotations:

nginx.ingress.kubernetes.io/proxy-body-size: "20m"

.......

Note:

1. 20m ~ 20MB and no need restart nginx pod .

2. some template yaml have added this line to annotations : ingress.kubernetes.io/proxy-body-size: "<valuem>" but it is not enough you must add this line : nginx.ingress.kubernetes.io/proxy-body-size there is nginx in the first line .

Sunday, October 9, 2022

Upgrade nginx on the fly - no downtime

#first please check pid file by:

cat /var/run/nginx.pid

#copy new binary to /sbin/nginx and force overwrite if you don't use option "-f" you will see this error :
cp: cannot create regular file ‘/sbin/nginx’: Text file busy
/bin/cp -f nginx /sbin/nginx

#spawn a new nginx master/workers set

kill -s USR2 `cat /var/run/nginx.pid`

#check process

ps aux | grep nginx

# check pid

tail -n +1 /var/run/nginx.pid*

#shut down the old master's worker

kill -s WINCH `cat /var/run/nginx.pid.oldbin`

#check
ps aux | grep nginx

# safely shut down the old master process

kill -s QUIT `cat /var/run/nginx.pid.oldbin`

Solved : ./configure: error: the Google perftools module requires the Google perftools library

./configure: error: the Google perftools module requires the Google perftools

library. You can either do not enable the module or install the library.

to fix above error run this cmd:

yum install gperftools-devel

Saturday, October 8, 2022

Understanding naxsilogs

NAXSI_FMT

NAXSI_FMT are outputed by naxsi in your errorlog :

2013/11/10 07:36:19 [error] 8278#0: *5932 NAXSI_FMT: ip=X.X.X.X&server=Y.Y.Y.Y&uri=/phpMyAdmin-2.8.2/scripts/setup.php&learning=0&vers=0.52&total_processed=472&total_blocked=204&block=0&cscore0=$UWA&score0=8&zone0=HEADERS&id0=42000227&var_name0=user-agent, client: X.X.X.X, server: blog.memze.ro, request: "GET /phpMyAdmin-2.8.2/scripts/setup.php HTTP/1.1", host: "X.X.X.X"

Here, client X.X.X.X request to server Y.Y.Y.Y did trigger the rule 42000227 in the var named user-agent in theHEADERS zone. id X might seem obscure, but you can see the meaning in naxsi_core.rules:

MainRule "str:<" "msg:html open tag" "mz:ARGS|URL|BODY|$HEADERS_VAR:Cookie" "s:$XSS:8" id:1302;

NAXSI_FMT is composed of different items :

ip : Client's ip
server : Requested Hostname (as seen in http header Host)
uri: Requested URI (without arguments, stops at ?)
learning: tells if naxsi was in learning mode (0/1)
vers : Naxsi version, only since 0.51
total_processed: Total number of requests processed by nginx's worker
total_blocked: Total number of requests blocked by (naxsi) nginx's worker
zoneN: Zone in which match happened (see "Zones" in the table below)
idN: The rule id that matched
var_nameN: Variable name in which match happened (optional)
cscoreN : named score tag
scoreN : associated named score value

Several groups of zone, id, var_name, cscore and score can be present in a single line.

NAXSI_EXLOG

NAXSI_EXLOG is a complement to naxsilogs. Along with exceptions, it contains actual content of the matched request. While NAXSI_FMT only contains IDs and location of exception, NAXSI_EXLOG provides actual content, allowing you to easily decide if it's a false positive or not.

Learning tools uses this at his advantage. Extensive log is enabled by adding the following line in your server {} section but out of your location.

set $naxsi_extensive_log 1;

This feature is provided by runtime-modifiers.

2013/05/30 20:47:05 [debug] 10804#0:*1 NAXSI_EXLOG: ip=127.0.0.1&server=127.0.0.1&uri=/&id=1302&zone=ARGS&var_name=a&content=a<>bcd
2013/05/30 20:47:05 [error] 10804#0:*1 NAXSI_FMT: ip=127.0.0.1&server=127.0.0.1&uri=/&learning=0&vers=0.50&total_processed=1&total_blocked=1&zone0=ARGS&id0=1302&var_name0=a, client: 127.0.0.1, server: , request: "GET /?a=a<>bcd HTTP/1.0", host: "127.0.0.1"

Naxsi Internal IDs

"User defined" rules are supposed to have IDs > 1000.

IDs inferior 1000 are reserved for naxsi internal rules, which are usually related to protocol sanity and things that cannot be expressed through regular expressions or string matches.

Think twice before whitelisting one of those IDs, as it might partially/totally disable naxsi.

Reference: https://github.com/nbs-system/naxsi/wiki/naxsilogs

Tuesday, October 4, 2022

Solved: utils/geo_lookup.cc:131:32: error: invalid conversion from ‘const MMDB_s’ to ‘MMDB_s’ [-fpermissive]

If you see this error when you compiled Modsecurity :

utils/geo_lookup.cc:131:32: error: invalid conversion from ‘const MMDB_s’ to ‘MMDB_s’ [-fpermissive]

because you are using old lib of maxminddb to solved it you should manual compile libmaxminddb from https://github.com/maxmind/libmaxminddb/releases:

sudo ./configure

sudo make

sudo make check

sudo make install

sudo ldconfig

Then you compile Modsecurity again, it should be ok .

Solved: libtoolize: command not found

Solved:

yum install libtool

Sunday, October 2, 2022

Solved: python error: ImportError: No module named elasticsearch

you need run below command :

pip install elasticsearch

That's all !

Understanding naxsi rules

Rules are meant to search for patterns in parts of a request to detect attacks.

ie. DROP any request containing the string 'zz' in any GET or POST argument : MainRule id:424242 "str:zz" "mz:ARGS|BODY" "s:DROP";

Rules can be present at location level (BasicRule) or at http level (MainRule).

Rules have the following schema :

Everything must be quoted with double quotes, except the id part.

ID (id:...)

id:num is the unique numerical ID of the rule, that will be used in NAXSI_FMT or whitelists.

IDs inferior to 1000 are reserved for naxsi internal rules (protocol mismatch etc.)

Match Pattern

Match pattern can be a regular expression, a string match, or a call to a lib (libinjection) :

rx:foo|bar : will match foo or bar
str:foo|bar : will match foo|bar
d:libinj_xss : will match if libinjection says it's XSS (>= 0.55rc2)
d:libinj_sql : will match if libinjection says it's SQLi (>= 0.55rc2)

Using plain string match when possible is recommended, as it's way faster. All strings must be lowercase, since naxsi's matches are case insensitive.

Score (s:...)

s is the score section. You can create "named" counters: s:$FOOBAR:4 will increase counter $FOOBAR value by 4. One rule can increase several scores: s:$FOO:4,$BAR:8 will increase both $FOO by 4 and $BAR by 8. A rule can as well directly specifiy an action such a BLOCK (blocks the request in non-learning mode) or DROP (blocks the request even in learning mode) Named scores are later handled by CheckRules.

MatchZone (mz:...)

Please refer to Match Zones for details.

mz is the match zone, defining which part of the request will be inspected by the rule.

In rules, all matchzones but $URL*: are treated as OR conditions :

MainRule id:4242 str:z "mz:$ARGS_VAR:X|BODY";

pattern 'z' will be searched in GET var 'X' and all BODY vars.

MainRule id:4242 str:z "mz:$ARGS_VAR:X|BODY|$URL_X:^/foo";

pattern 'z' will be searched in GET var 'X' and all BODY vars as long as URL starts with /foo.

Starting from naxsi 0.55rc0, for unknown content-types, you can use the RAW_BODY match-zone. RAW_BODY rules looks like that:

MainRule id:4241 s:DROP str:RANDOMTHINGS mz:RAW_BODY;

Rules in the RAW_BODY zone will only applied when:

The Content-type is unknown (which means naxsi doesn't know how to properly parse the request)
id 11 (which is the internal blocking rule for 'unknown content-type') is whitelisted.

Then, the full body (url decoded and with null-bytes replaced by '0') is passed to this set of rules. The full body is matched again the regexes or string matches.

Whitelists for RAW_BODY rules are actually written just like normal body rules, such as:

BasicRule wl:4241 "mz:$URL:/rata|BODY";

Human readable message (msg:...)

msg is a string describing the pattern. This is mostly used for analyzing and to have some human-understandable text.

Negative Keyword (negative)

negative is a keyword that can be used to make a negative rule. Score is applied when the rule doesn't match :

MainRule negative "rx:multipart/form-data|application/x-www-form-urlencoded" "msg:Content is neither mulipart/x-www-form.." "mz:$HEADERS_VAR:Content-type" "s:$EVADE:4" id:1402;

Reference : https://github.com/nbs-system/naxsi/wiki/rules-bnf

Solved : nginx: [emerg] Naxsi-Config : Incorrect line MainRule rx:select in /etc/nginx/waf/naxsi_core.rules:23

If you see this error on Redhat( or Centos) when you are testing config with nginx ( nginx -t or reload config) :

nginx: [emerg] Naxsi-Config : Incorrect line MainRule rx:select|union|update|delete|insert|table|from|ascii|hex|unhex|drop|load_file|substr|group_concat|dumpfile (./naxsi/naxsi_src/naxsi_skeleton.c/973)... in /etc/nginx/waf/naxsi_core.rules:23

This is solution to solved it :

1. Get naxsi from this url: https://github.com/wargio/naxsi by cmd:
git clone --recurse-submodules https://github.com/wargio/naxsi.git
2. Then re-compile modules follow the steps below:
- first cd into nginx directory then :
./configure --with-compat --add-dynamic-module=./naxsi/naxsi_src
- and run :
make modules
- After that copy module file into nginx modules directory: cp objs/ngx_http_naxsi_module.so /usr/lib64/nginx/modules/

Thanks Wargio !

Done !

Saturday, October 1, 2022

Create nginx systemd file

create file : /lib/systemd/system/nginx.service with below content:

[Unit]
Description=The NGINX HTTP and reverse proxy server
After=syslog.target network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target

[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t
ExecStart=/usr/sbin/nginx
ExecReload=/usr/sbin/nginx -s reload
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target

then run :

systemctl daemon-reload && systemctl enable nginx && systemctl start nginx