{"id":4082,"date":"2022-03-24T10:28:00","date_gmt":"2022-03-24T03:28:00","guid":{"rendered":"https:\/\/bluebik.com\/?post_type=blog&#038;p=4082"},"modified":"2025-04-10T14:45:03","modified_gmt":"2025-04-10T07:45:03","slug":"deep-reinforcement-learning","status":"publish","type":"blog","link":"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/","title":{"rendered":"Deep Reinforcement Learning \u0e41\u0e1a\u0e1a\u0e44\u0e21\u0e48 Deep"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">\u0e43\u0e19\u0e1a\u0e23\u0e23\u0e14\u0e32 subfields \u0e02\u0e2d\u0e07 machine learning \u0e13 \u0e15\u0e2d\u0e19\u0e19\u0e35\u0e49\u0e04\u0e07\u0e44\u0e21\u0e48\u0e21\u0e35\u0e2d\u0e30\u0e44\u0e23\u0e17\u0e35\u0e48\u0e01\u0e33\u0e25\u0e31\u0e07\u0e40\u0e1b\u0e47\u0e19\u0e01\u0e23\u0e30\u0e41\u0e2a\u0e21\u0e32\u0e01\u0e44\u0e1b\u0e01\u0e27\u0e48\u0e32 reinforce-ment learning \u0e2d\u0e35\u0e01\u0e41\u0e25\u0e49\u0e27 (\u0e15\u0e48\u0e2d\u0e44\u0e1b\u0e08\u0e30\u0e02\u0e2d\u0e40\u0e23\u0e35\u0e22\u0e01\u0e22\u0e48\u0e2d\u0e27\u0e48\u0e32 RL) \u0e0b\u0e36\u0e48\u0e07\u0e16\u0e49\u0e32\u0e40\u0e23\u0e32\u0e22\u0e49\u0e2d\u0e19\u0e01\u0e25\u0e31\u0e1a\u0e44\u0e1b\u0e0b\u0e31\u0e01 2-3 \u0e1b\u0e35\u0e17\u0e35\u0e48\u0e41\u0e25\u0e49\u0e27 \u0e15\u0e33\u0e41\u0e2b\u0e19\u0e48\u0e07\u0e19\u0e35\u0e49\u0e22\u0e31\u0e07\u0e04\u0e07\u0e40\u0e1b\u0e47\u0e19\u0e02\u0e2d\u0e07 deep learning \u0e2d\u0e22\u0e39\u0e48\u0e40\u0e25\u0e22 \u0e0a\u0e31\u0e14\u0e40\u0e08\u0e19\u0e27\u0e48\u0e32\u0e42\u0e25\u0e01\u0e40\u0e1b\u0e25\u0e35\u0e48\u0e22\u0e19\u0e40\u0e23\u0e47\u0e27\u0e41\u0e04\u0e48\u0e44\u0e2b\u0e19\u0e01\u0e31\u0e19<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e1a\u0e17\u0e04\u0e27\u0e32\u0e21\u0e19\u0e35\u0e49\u0e15\u0e31\u0e49\u0e07\u0e43\u0e08\u0e40\u0e25\u0e48\u0e32\u0e41\u0e1a\u0e1a\u0e07\u0e48\u0e32\u0e22\u0e46\u0e43\u0e2b\u0e49\u0e04\u0e19\u0e17\u0e35\u0e48\u0e1e\u0e2d\u0e21\u0e35\u0e1e\u0e37\u0e49\u0e19\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a machine learning \u0e21\u0e32\u0e1a\u0e49\u0e32\u0e07 \u0e44\u0e14\u0e49\u0e40\u0e02\u0e49\u0e32\u0e43\u0e08\u0e27\u0e48\u0e32 RL \u0e04\u0e37\u0e2d\u0e2d\u0e30\u0e44\u0e23 \u0e41\u0e25\u0e49\u0e27 fit in \u0e2d\u0e22\u0e39\u0e48\u0e15\u0e23\u0e07\u0e44\u0e2b\u0e19\u0e43\u0e19 field \u0e02\u0e2d\u0e07\u0e27\u0e34\u0e0a\u0e32\u0e19\u0e35\u0e49<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Supervised, Unsupervised and RL<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e04\u0e07\u0e44\u0e21\u0e48\u0e02\u0e2d deep dive \u0e21\u0e32\u0e01\u0e43\u0e19 2 \u0e2d\u0e31\u0e19\u0e41\u0e23\u0e01 Supervised \u0e04\u0e37\u0e2d \u0e40\u0e23\u0e32\u0e23\u0e39\u0e49\u0e04\u0e48\u0e32\u0e02\u0e2d\u0e07 output variables \u0e2a\u0e48\u0e27\u0e19\u0e16\u0e49\u0e32\u0e40\u0e1b\u0e47\u0e19 Un-supervised \u0e04\u0e37\u0e2d \u0e40\u0e23\u0e32\u0e44\u0e21\u0e48\u0e23\u0e39\u0e49 \u0e43\u0e19\u0e02\u0e13\u0e30\u0e17\u0e35\u0e48 RL \u0e19\u0e31\u0e49\u0e19 \u0e1a\u0e32\u0e07\u0e15\u0e33\u0e23\u0e32\u0e01\u0e25\u0e48\u0e32\u0e27\u0e27\u0e48\u0e32\u0e40\u0e1b\u0e47\u0e19\u0e25\u0e39\u0e01\u0e1c\u0e2a\u0e21\u0e23\u0e30\u0e2b\u0e27\u0e48\u0e32\u0e07 Supervised \u0e01\u0e31\u0e1a Unsupervised \u0e41\u0e15\u0e48\u0e43\u0e19\u0e21\u0e38\u0e21\u0e21\u0e2d\u0e07\u0e2a\u0e48\u0e27\u0e19\u0e15\u0e31\u0e27\u0e41\u0e25\u0e49\u0e27 RL \u0e40\u0e1b\u0e47\u0e19\u0e2d\u0e35\u0e01 learning paradigm \u0e17\u0e35\u0e48\u0e2d\u0e22\u0e39\u0e48\u0e04\u0e19\u0e25\u0e30 setting \u0e01\u0e31\u0e1a\u0e2a\u0e2d\u0e07\u0e2d\u0e31\u0e19\u0e41\u0e23\u0e01\u0e42\u0e14\u0e22\u0e2a\u0e34\u0e49\u0e19\u0e40\u0e0a\u0e34\u0e07 \u0e40\u0e1e\u0e23\u0e32\u0e30\u0e43\u0e19 2 \u0e41\u0e1a\u0e1a\u0e41\u0e23\u0e01\u0e15\u0e31\u0e27 model \u0e2d\u0e22\u0e48\u0e32\u0e07\u0e19\u0e49\u0e2d\u0e22\u0e15\u0e49\u0e2d\u0e07\u0e1c\u0e48\u0e32\u0e19\u0e01\u0e32\u0e23 training \u0e01\u0e48\u0e2d\u0e19\u0e17\u0e35\u0e48\u0e08\u0e30\u0e19\u0e33\u0e44\u0e1b implement \u0e43\u0e19 real environment (\u0e02\u0e2d\u0e40\u0e23\u0e35\u0e22\u0e01\u0e22\u0e48\u0e2d\u0e27\u0e48\u0e32 env) \u0e43\u0e19\u0e02\u0e13\u0e30\u0e17\u0e35\u0e48 RL \u0e04\u0e37\u0e2d\u0e15\u0e31\u0e27 model \u0e40\u0e23\u0e35\u0e22\u0e19\u0e23\u0e39\u0e49\u0e43\u0e19 env \u0e40\u0e25\u0e22 \u0e42\u0e14\u0e22\u0e17\u0e35\u0e48 agent (\u0e15\u0e31\u0e27 model) \u0e43\u0e2a\u0e48 action (input) \u0e40\u0e02\u0e49\u0e32\u0e44\u0e1b\u0e43\u0e19 env (function \u0e17\u0e35\u0e48 map \u0e23\u0e30\u0e2b\u0e27\u0e48\u0e32\u0e07 input \u0e01\u0e31\u0e1a output) \u0e41\u0e25\u0e30 env \u0e01\u0e47\u0e08\u0e30\u0e2a\u0e48\u0e07 reward + state (outputs) \u0e01\u0e25\u0e31\u0e1a\u0e2d\u0e2d\u0e01\u0e21\u0e32 \u0e42\u0e14\u0e22\u0e17\u0e31\u0e49\u0e07\u0e2b\u0e21\u0e14\u0e19\u0e35\u0e49\u0e2d\u0e22\u0e39\u0e48\u0e1a\u0e19\u0e1e\u0e37\u0e49\u0e19\u0e10\u0e32\u0e19\u0e02\u0e2d\u0e07 probabilistic model \u0e0a\u0e37\u0e48\u0e2d sequential decision process \u0e02\u0e49\u0e2d\u0e2a\u0e31\u0e07\u0e40\u0e01\u0e15\u0e04\u0e37\u0e2d \u0e41\u0e21\u0e49\u0e27\u0e48\u0e32 outputs \u0e02\u0e2d\u0e07 RL \u0e08\u0e30\u0e21\u0e35 2 \u0e2d\u0e31\u0e19\u0e04\u0e37\u0e2d reward \u0e01\u0e31\u0e1a state \u0e02\u0e2d\u0e07 env \u0e41\u0e15\u0e48\u0e40\u0e1b\u0e49\u0e32\u0e2b\u0e21\u0e32\u0e22 (objective function) \u0e02\u0e2d\u0e07 RL \u0e04\u0e37\u0e2d\u0e01\u0e32\u0e23 maximize reward \u0e17\u0e35\u0e48\u0e08\u0e30\u0e44\u0e14\u0e49\u0e23\u0e31\u0e1a\u0e08\u0e32\u0e01 env \u0e40\u0e17\u0e48\u0e32\u0e19\u0e31\u0e49\u0e19 \u0e42\u0e14\u0e22\u0e17\u0e35\u0e48 state \u0e40\u0e1b\u0e47\u0e19\u0e41\u0e04\u0e48 learning co-factor \u0e40\u0e17\u0e48\u0e32\u0e19\u0e31\u0e49\u0e19 \u0e0b\u0e36\u0e48\u0e07\u0e15\u0e48\u0e32\u0e07\u0e08\u0e32\u0e01 traditional learning (Supervised &amp; Unsupervised) \u0e17\u0e35\u0e48\u0e21\u0e31\u0e01\u0e08\u0e30\u0e40\u0e2d\u0e32 outputs \u0e17\u0e31\u0e49\u0e07\u0e2b\u0e21\u0e14\u0e21\u0e32\u0e2a\u0e23\u0e49\u0e32\u0e07\u0e40\u0e1b\u0e47\u0e19 objective function<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"468\" height=\"210\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image.png\" alt=\"\" class=\"wp-image-3982\" style=\"width:550px;height:auto\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image.png 468w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-300x135.png 300w\" sizes=\"(max-width: 468px) 100vw, 468px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e19\u0e2d\u0e01\u0e08\u0e32\u0e01\u0e40\u0e23\u0e37\u0e48\u0e2d\u0e07 training \u0e41\u0e25\u0e49\u0e27 RL \u0e22\u0e31\u0e07\u0e15\u0e48\u0e32\u0e07\u0e08\u0e32\u0e01 traditional learning \u0e2d\u0e22\u0e48\u0e32\u0e07\u0e19\u0e49\u0e2d\u0e22\u0e2d\u0e35\u0e01 2 \u0e08\u0e38\u0e14\u0e43\u0e2b\u0e0d\u0e48\u0e46 1) \u0e43\u0e19\u0e02\u0e13\u0e30\u0e17\u0e35\u0e48 traditional learning \u0e19\u0e31\u0e49\u0e19 assume \u0e27\u0e48\u0e32 input \u0e44\u0e21\u0e48\u0e44\u0e14\u0e49\u0e40\u0e1b\u0e25\u0e35\u0e48\u0e22\u0e19 state \u0e02\u0e2d\u0e07 function \u0e17\u0e35\u0e48 map \u0e23\u0e30\u0e2b\u0e27\u0e48\u0e32\u0e07 input \u0e01\u0e31\u0e1a output \u0e41\u0e15\u0e48 RL \u0e19\u0e31\u0e49\u0e19\u0e2d\u0e32\u0e08\u0e08\u0e30\u0e40\u0e1b\u0e25\u0e35\u0e48\u0e22\u0e19\u0e2b\u0e23\u0e37\u0e2d\u0e44\u0e21\u0e48\u0e40\u0e1b\u0e25\u0e35\u0e48\u0e22\u0e19 state \u0e02\u0e2d\u0e07 env \u0e01\u0e47\u0e44\u0e14\u0e49 2) \u0e43\u0e19 traditional learning \u0e40\u0e21\u0e37\u0e48\u0e2d\u0e40\u0e23\u0e32\u0e43\u0e2a\u0e48 input \u0e40\u0e23\u0e32\u0e41\u0e17\u0e1a\u0e08\u0e30\u0e23\u0e39\u0e49 output \u0e43\u0e19\u0e17\u0e31\u0e19\u0e17\u0e35 \u0e41\u0e15\u0e48\u0e43\u0e19 RL \u0e2b\u0e25\u0e32\u0e22\u0e46\u0e04\u0e23\u0e31\u0e49\u0e07\u0e40\u0e23\u0e32\u0e08\u0e30\u0e44\u0e21\u0e48\u0e23\u0e39\u0e49 output \u0e17\u0e31\u0e19\u0e17\u0e35 \u0e15\u0e31\u0e27\u0e2d\u0e22\u0e48\u0e32\u0e07\u0e17\u0e35\u0e48\u0e2a\u0e38\u0e14\u0e42\u0e15\u0e48\u0e07 \u0e40\u0e0a\u0e48\u0e19 \u0e01\u0e32\u0e23\u0e40\u0e25\u0e48\u0e19\u0e40\u0e01\u0e21\u0e17\u0e35\u0e48\u0e08\u0e30\u0e23\u0e39\u0e49\u0e27\u0e48\u0e32 reward \u0e0a\u0e19\u0e30\u0e2b\u0e23\u0e37\u0e2d regret \u0e41\u0e1e\u0e49\u0e08\u0e23\u0e34\u0e07\u0e46\u0e01\u0e47\u0e15\u0e48\u0e2d\u0e40\u0e21\u0e37\u0e48\u0e2d\u0e40\u0e01\u0e21 (learning) \u0e19\u0e31\u0e49\u0e19\u0e08\u0e1a\u0e41\u0e25\u0e49\u0e27\u0e40\u0e17\u0e48\u0e32\u0e19\u0e31\u0e49\u0e19 \u0e22\u0e31\u0e07\u0e21\u0e35\u0e04\u0e27\u0e32\u0e21\u0e41\u0e15\u0e01\u0e15\u0e48\u0e32\u0e07\u0e23\u0e30\u0e2b\u0e27\u0e48\u0e32\u0e07 traditional learning \u0e01\u0e31\u0e1a RL \u0e2d\u0e22\u0e39\u0e48\u0e2d\u0e35\u0e01\u0e2b\u0e25\u0e32\u0e22\u0e02\u0e49\u0e2d \u0e1a\u0e17\u0e04\u0e27\u0e32\u0e21\u0e19\u0e35\u0e49\u0e04\u0e07\u0e44\u0e21\u0e48\u0e25\u0e07\u0e25\u0e36\u0e01\u0e21\u0e32\u0e01\u0e01\u0e27\u0e48\u0e32\u0e19\u0e35\u0e49 \u0e41\u0e04\u0e48\u0e2d\u0e22\u0e32\u0e01\u0e08\u0e30\u0e40\u0e19\u0e49\u0e19\u0e22\u0e49\u0e33\u0e40\u0e09\u0e22\u0e46\u0e27\u0e48\u0e32 RL \u0e40\u0e1b\u0e47\u0e19\u0e04\u0e19\u0e25\u0e30 learning paradigm \u0e01\u0e31\u0e1a traditional learning models \u0e41\u0e04\u0e48\u0e19\u0e31\u0e49\u0e19<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Deep Learning + RL = \u0e02\u0e2d\u0e40\u0e25\u0e48\u0e19\u0e43\u0e2b\u0e21\u0e48\u0e02\u0e2d\u0e07 DS geeks<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e41\u0e01\u0e19\u0e2b\u0e25\u0e31\u0e01\u0e02\u0e2d\u0e07 deep learning \u0e04\u0e37\u0e2d Neural Network \u0e0b\u0e36\u0e48\u0e07\u0e40\u0e1b\u0e47\u0e19 ML Model \u0e16\u0e39\u0e01\u0e04\u0e34\u0e14\u0e04\u0e49\u0e19\u0e21\u0e32\u0e15\u0e31\u0e49\u0e07\u0e41\u0e15\u0e48\u0e22\u0e38\u0e04 60 \u0e41\u0e15\u0e48\u0e01\u0e25\u0e31\u0e1a\u0e21\u0e32\u0e40\u0e01\u0e34\u0e14\u0e43\u0e2b\u0e21\u0e48\u0e40\u0e1b\u0e47\u0e19\u0e01\u0e23\u0e30\u0e41\u0e2a\u0e43\u0e19\u0e22\u0e38\u0e04\u0e17\u0e35\u0e48\u0e04\u0e2d\u0e21\u0e1e\u0e34\u0e27\u0e40\u0e15\u0e2d\u0e23\u0e4c\u0e21\u0e35\u0e04\u0e27\u0e32\u0e21\u0e2a\u0e32\u0e21\u0e32\u0e23\u0e16\u0e17\u0e35\u0e48\u0e08\u0e30\u0e23\u0e2d\u0e07\u0e23\u0e31\u0e1a\u0e01\u0e32\u0e23\u0e04\u0e33\u0e19\u0e27\u0e13 Big Data \u0e44\u0e14\u0e49\u0e41\u0e25\u0e49\u0e27\u0e19\u0e31\u0e48\u0e19\u0e40\u0e2d\u0e07<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Deep RL \u0e04\u0e37\u0e2d\u0e01\u0e32\u0e23\u0e40\u0e2d\u0e32\u0e40\u0e17\u0e04\u0e19\u0e34\u0e04\u0e02\u0e2d\u0e07 deep learning \u0e21\u0e32\u0e1b\u0e23\u0e30\u0e22\u0e38\u0e01\u0e15\u0e4c\u0e43\u0e0a\u0e49\u0e43\u0e19 RL \u0e01\u0e25\u0e48\u0e32\u0e27\u0e07\u0e48\u0e32\u0e22\u0e46\u0e04\u0e37\u0e2d \u0e40\u0e2d\u0e32 Neural Network \u0e21\u0e32\u0e40\u0e1b\u0e47\u0e19\u0e40\u0e04\u0e23\u0e37\u0e48\u0e2d\u0e07\u0e21\u0e37\u0e2d\u0e43\u0e19\u0e01\u0e32\u0e23 approximate functions \u0e41\u0e25\u0e30 policies \u0e43\u0e19 RL \u0e19\u0e31\u0e49\u0e19\u0e40\u0e2d\u0e07 \u0e0b\u0e36\u0e48\u0e07\u0e43\u0e19\u0e1a\u0e17\u0e04\u0e27\u0e32\u0e21\u0e19\u0e35\u0e49 \u0e40\u0e23\u0e32\u0e08\u0e30\u0e21\u0e32\u0e2d\u0e18\u0e34\u0e1a\u0e32\u0e22 basics \u0e02\u0e2d\u0e07 deep RL \u0e1c\u0e48\u0e32\u0e19\u0e42\u0e1b\u0e23\u0e41\u0e01\u0e23\u0e21 simulation \u0e40\u0e1e\u0e37\u0e48\u0e2d\u0e14\u0e39 algorithms \u0e17\u0e35\u0e48\u0e43\u0e0a\u0e49\u0e43\u0e19\u0e01\u0e32\u0e23\u0e41\u0e01\u0e49\u0e1b\u0e31\u0e0d\u0e2b\u0e32 \u0e42\u0e14\u0e22\u0e08\u0e30\u0e40\u0e02\u0e35\u0e22\u0e19\u0e42\u0e1b\u0e23\u0e41\u0e01\u0e23\u0e21\u0e43\u0e19\u0e20\u0e32\u0e29\u0e32 Julia \u0e1a\u0e19 Pluto notebook<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Swinging Pendulum (\u0e25\u0e39\u0e01\u0e15\u0e38\u0e49\u0e21\u0e41\u0e01\u0e27\u0e48\u0e07)<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e42\u0e08\u0e17\u0e22\u0e4c Swinging Pendulum \u0e40\u0e1b\u0e47\u0e19\u0e42\u0e08\u0e17\u0e22\u0e4c\u0e15\u0e31\u0e27\u0e2d\u0e22\u0e48\u0e32\u0e07\u0e17\u0e35\u0e48\u0e04\u0e19\u0e25\u0e07\u0e27\u0e34\u0e0a\u0e32 RL \u0e15\u0e49\u0e2d\u0e07\u0e44\u0e14\u0e49\u0e40\u0e08\u0e2d\u0e17\u0e38\u0e01\u0e04\u0e19 \u0e40\u0e1b\u0e47\u0e19\u0e1b\u0e31\u0e0d\u0e2b\u0e32\u0e43\u0e19\u0e01\u0e25\u0e38\u0e48\u0e21 Markov decision process (MDP) \u0e41\u0e1a\u0e1a fully observable state \u0e01\u0e25\u0e48\u0e32\u0e27\u0e04\u0e37\u0e2d\u0e40\u0e23\u0e32\u0e23\u0e39\u0e49\u0e04\u0e27\u0e32\u0e21\u0e40\u0e1b\u0e47\u0e19\u0e44\u0e1b\u0e02\u0e2d\u0e07 env state \u0e17\u0e31\u0e49\u0e07\u0e2b\u0e21\u0e14 \u0e42\u0e08\u0e17\u0e22\u0e4c\u0e04\u0e37\u0e2d\u0e40\u0e23\u0e32\u0e15\u0e49\u0e2d\u0e07\u0e1e\u0e22\u0e32\u0e22\u0e32\u0e21\u0e43\u0e2a\u0e48 torque \u0e40\u0e02\u0e49\u0e32\u0e44\u0e1b\u0e43\u0e2b\u0e49\u0e40\u0e2b\u0e21\u0e32\u0e30\u0e2a\u0e21 \u0e08\u0e19\u0e01\u0e23\u0e30\u0e17\u0e31\u0e48\u0e07\u0e2a\u0e32\u0e21\u0e32\u0e23\u0e16 balance \u0e25\u0e39\u0e01\u0e15\u0e38\u0e49\u0e21\u0e43\u0e2b\u0e49\u0e15\u0e31\u0e49\u0e07\u0e09\u0e32\u0e01 90 \u0e2d\u0e07\u0e28\u0e32\u0e2d\u0e22\u0e39\u0e48\u0e14\u0e49\u0e32\u0e19\u0e1a\u0e19\u0e44\u0e14\u0e49<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e43\u0e19\u0e02\u0e31\u0e49\u0e19\u0e15\u0e2d\u0e19\u0e16\u0e31\u0e14\u0e44\u0e1b \u0e40\u0e23\u0e32\u0e15\u0e49\u0e2d\u0e07\u0e01\u0e33\u0e2b\u0e19\u0e14 state space (\u0e41\u0e17\u0e19\u0e14\u0e49\u0e27\u0e22 \u03b4) \u0e02\u0e2d\u0e07\u0e42\u0e08\u0e17\u0e22\u0e4c \u0e0b\u0e36\u0e48\u0e07\u0e43\u0e19\u0e01\u0e23\u0e13\u0e35\u0e25\u0e39\u0e01\u0e15\u0e38\u0e49\u0e21\u0e41\u0e01\u0e27\u0e48\u0e07\u0e08\u0e30\u0e40\u0e1b\u0e47\u0e19\u0e21\u0e38\u0e21\u0e2d\u0e07\u0e28\u0e32\u0e02\u0e2d\u0e07\u0e25\u0e39\u0e01\u0e15\u0e38\u0e49\u0e21 \u03b8\u00a0\u0e01\u0e31\u0e1a\u0e04\u0e27\u0e32\u0e21\u0e40\u0e23\u0e47\u0e27\u0e40\u0e0a\u0e34\u0e07\u0e21\u0e38\u0e21 \u03c9\u00a0\u0e0b\u0e36\u0e48\u0e07\u0e01\u0e47\u0e04\u0e37\u0e2d<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"617\" height=\"159\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-2.png\" alt=\"\" class=\"wp-image-3984\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-2.png 617w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-2-300x77.png 300w\" sizes=\"(max-width: 617px) 100vw, 617px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e42\u0e14\u0e22\u0e17\u0e31\u0e49\u0e07\u00a0\u03b8\u00a0\u0e01\u0e31\u0e1a \u03c9\u00a0\u0e40\u0e1b\u0e47\u0e19\u0e15\u0e31\u0e27\u0e41\u0e1b\u0e23\u0e2a\u0e38\u0e48\u0e21\u0e01\u0e23\u0e30\u0e08\u0e32\u0e22\u0e15\u0e31\u0e27\u0e41\u0e1a\u0e1a Gaussian \u0e42\u0e14\u0e22\u0e21\u0e35\u0e04\u0e48\u0e32 mean \u0e40\u0e1b\u0e47\u0e19 0\u00a0\u0e41\u0e25\u0e30\u0e04\u0e48\u0e32\u0e04\u0e27\u0e32\u0e21\u0e40\u0e1a\u0e35\u0e48\u0e22\u0e07\u0e40\u0e1a\u0e19\u0e40\u0e1b\u0e47\u0e19\u00a0\u00a0(\u0e40\u0e02\u0e35\u0e22\u0e19\u0e2a\u0e31\u0e0d\u0e25\u0e31\u0e01\u0e29\u0e13\u0e4c\u0e22\u0e48\u0e2d\u0e40\u0e1b\u0e47\u0e19 ~N(0,1))<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e40\u0e21\u0e37\u0e48\u0e2d\u0e21\u0e35 state space \u0e41\u0e25\u0e49\u0e27 \u0e40\u0e23\u0e32\u0e15\u0e49\u0e2d\u0e07\u0e01\u0e33\u0e2b\u0e19\u0e14 action space (\u0e41\u0e17\u0e19\u0e14\u0e49\u0e27\u0e22 A) \u0e0b\u0e36\u0e48\u0e07\u0e01\u0e47\u0e04\u0e37\u0e2d\u0e40\u0e0b\u0e15\u0e02\u0e2d\u0e07\u0e27\u0e34\u0e18\u0e35\u0e01\u0e32\u0e23\u0e43\u0e2a\u0e48 torque \u0e43\u0e2b\u0e49\u0e01\u0e31\u0e1a\u0e25\u0e39\u0e01\u0e15\u0e38\u0e49\u0e21 \u0e43\u0e2a\u0e48\u0e41\u0e1a\u0e1a\u0e17\u0e27\u0e19\u0e40\u0e02\u0e47\u0e21 \u0e15\u0e32\u0e21\u0e40\u0e02\u0e47\u0e21 \u0e43\u0e2a\u0e48\u0e01\u0e35\u0e48 Newton-Meter \u0e0b\u0e36\u0e48\u0e07\u0e43\u0e19\u0e42\u0e08\u0e17\u0e22\u0e4c RL \u0e19\u0e31\u0e49\u0e19\u0e08\u0e30\u0e41\u0e1a\u0e48\u0e07\u0e2d\u0e2d\u0e01\u0e40\u0e1b\u0e47\u0e19 2 \u0e01\u0e25\u0e38\u0e48\u0e21\u0e43\u0e2b\u0e0d\u0e48\u0e46 \u0e04\u0e37\u0e2d \u0e42\u0e08\u0e17\u0e22\u0e4c\u0e17\u0e35\u0e48 action space \u0e40\u0e1b\u0e47\u0e19 discrete \u0e01\u0e31\u0e1a continuous \u0e0b\u0e36\u0e48\u0e07 algorithms \u0e17\u0e35\u0e48\u0e43\u0e0a\u0e49\u0e43\u0e19\u0e01\u0e32\u0e23\u0e41\u0e01\u0e49\u0e1b\u0e31\u0e0d\u0e2b\u0e32\u0e17\u0e31\u0e49\u0e07 2 \u0e41\u0e1a\u0e1a\u0e01\u0e47\u0e08\u0e30\u0e41\u0e15\u0e01\u0e15\u0e48\u0e32\u0e07\u0e01\u0e31\u0e19 \u0e43\u0e19\u0e1a\u0e17\u0e04\u0e27\u0e32\u0e21\u0e19\u0e35\u0e49\u0e08\u0e30\u0e22\u0e01\u0e15\u0e31\u0e27\u0e2d\u0e22\u0e48\u0e32\u0e07 2 \u0e01\u0e23\u0e30\u0e1a\u0e27\u0e19\u0e01\u0e32\u0e23\u0e27\u0e34\u0e18\u0e35\u0e01\u0e32\u0e23\u0e42\u0e08\u0e17\u0e22\u0e4c RL 1) Q-Learning 2) Actor-Critic Method \u0e42\u0e14\u0e22\u0e40\u0e23\u0e32\u0e08\u0e30\u0e43\u0e0a\u0e49 Julia package \u0e0a\u0e37\u0e48\u0e2d Crux \u0e43\u0e19\u0e01\u0e32\u0e23\u0e17\u0e33 simulation \u0e43\u0e2b\u0e49\u0e07\u0e48\u0e32\u0e22\u0e02\u0e36\u0e49\u0e19<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"468\" height=\"123\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-6.png\" alt=\"\" class=\"wp-image-3986\" style=\"width:568px;height:auto\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-6.png 468w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-6-300x79.png 300w\" sizes=\"(max-width: 468px) 100vw, 468px\" \/><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\"><strong>Deep Q-Learning<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e43\u0e19\u0e1a\u0e23\u0e23\u0e14\u0e32 RL algorithms \u0e41\u0e25\u0e49\u0e27\u0e17\u0e31\u0e49\u0e07\u0e2b\u0e21\u0e14 Q-Learning \u0e2b\u0e23\u0e37\u0e2d Value Learning \u0e16\u0e37\u0e2d\u0e27\u0e48\u0e32\u0e40\u0e1b\u0e47\u0e19\u0e27\u0e34\u0e18\u0e35\u0e01\u0e32\u0e23\u0e17\u0e35\u0e48\u0e1e\u0e37\u0e49\u0e19\u0e10\u0e32\u0e19\u0e17\u0e35\u0e48\u0e2a\u0e38\u0e14\u0e41\u0e25\u0e30\u0e22\u0e31\u0e07\u0e40\u0e1b\u0e47\u0e19\u0e27\u0e34\u0e18\u0e35\u0e01\u0e32\u0e23\u0e17\u0e35\u0e48 mathematical formula \u0e21\u0e35\u0e04\u0e27\u0e32\u0e21 elegant \u0e17\u0e35\u0e48\u0e2a\u0e38\u0e14\u0e27\u0e34\u0e18\u0e35\u0e19\u0e36\u0e07 (\u0e0b\u0e36\u0e48\u0e07\u0e2d\u0e30\u0e44\u0e23\u0e17\u0e35\u0e48\u0e21\u0e31\u0e19 mathematically elegant \u0e40\u0e19\u0e35\u0e49\u0e22\u0e21\u0e31\u0e01\u0e08\u0e30\u0e43\u0e0a\u0e49\u0e01\u0e31\u0e1a\u0e42\u0e08\u0e17\u0e22\u0e4c\u0e15\u0e38\u0e4a\u0e01\u0e15\u0e32\u0e40\u0e1b\u0e47\u0e19\u0e2b\u0e25\u0e31\u0e01 \u0e40\u0e2d\u0e32\u0e21\u0e32\u0e43\u0e0a\u0e49\u0e08\u0e23\u0e34\u0e07\u0e21\u0e31\u0e01\u0e08\u0e30\u0e44\u0e21\u0e48\u0e04\u0e48\u0e2d\u0e22\u0e21\u0e35\u0e1b\u0e23\u0e30\u0e2a\u0e34\u0e17\u0e18\u0e34\u0e20\u0e32\u0e1e\u0e21\u0e32\u0e01\u0e19\u0e31\u0e01) \u0e42\u0e14\u0e22\u0e40\u0e23\u0e32\u0e08\u0e30\u0e2b\u0e32 optimal policy \u0e08\u0e32\u0e01\u0e01\u0e32\u0e23\u0e40\u0e25\u0e37\u0e2d\u0e01 action \u0e17\u0e35\u0e48 maximize \u0e15\u0e31\u0e27 Q function (\u0e40\u0e1b\u0e47\u0e19 function \u0e02\u0e2d\u0e07 reward) \u0e0b\u0e36\u0e48\u0e07\u0e0a\u0e31\u0e14\u0e40\u0e08\u0e19\u0e27\u0e48\u0e32 action space \u0e43\u0e19\u0e01\u0e23\u0e13\u0e35\u0e19\u0e35\u0e49\u0e08\u0e30\u0e15\u0e49\u0e2d\u0e07 finite \u0e2b\u0e23\u0e37\u0e2d discrete \u0e19\u0e31\u0e49\u0e19\u0e40\u0e2d\u0e07 \u0e42\u0e14\u0e22 algorithm \u0e17\u0e35\u0e48\u0e40\u0e23\u0e32\u0e08\u0e30\u0e43\u0e0a\u0e49\u0e43\u0e19\u0e01\u0e32\u0e23\u0e41\u0e01\u0e49\u0e1b\u0e31\u0e0d\u0e2b\u0e32\u0e40\u0e04\u0e2a\u0e19\u0e35\u0e49\u0e04\u0e37\u0e2d deep q-network (DQN) \u0e43\u0e19\u0e01\u0e32\u0e23\u0e40\u0e25\u0e37\u0e2d\u0e01 optimal policy&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"583\" height=\"95\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-7.png\" alt=\"\" class=\"wp-image-3988\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-7.png 583w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-7-300x49.png 300w\" sizes=\"(max-width: 583px) 100vw, 583px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e08\u0e32\u0e01 discrete action space A<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e42\u0e14\u0e22 neural network<em>&nbsp;<\/em>\u0e17\u0e35\u0e48\u0e08\u0e30\u0e43\u0e0a\u0e49\u0e43\u0e19\u0e01\u0e32\u0e23\u0e40\u0e25\u0e37\u0e2d\u0e01 \u03c0<sub>DQN<\/sub>(s)&nbsp;\u0e08\u0e32\u0e01 A&nbsp;\u0e08\u0e30\u0e40\u0e1b\u0e47\u0e19 3-layer discrete network (\u0e02\u0e2d\u0e40\u0e23\u0e35\u0e22\u0e01\u0e27\u0e48\u0e32<em>&nbsp;<\/em>q-network)<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"468\" height=\"94\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-8.png\" alt=\"\" class=\"wp-image-3990\" style=\"width:620px;height:auto\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-8.png 468w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-8-300x60.png 300w\" sizes=\"(max-width: 468px) 100vw, 468px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e40\u0e21\u0e37\u0e48\u0e2d\u0e44\u0e14\u0e49 q-network \u0e41\u0e25\u0e49\u0e27 \u0e40\u0e23\u0e32\u0e08\u0e36\u0e07\u0e2a\u0e23\u0e49\u0e32\u0e07 DQN solver<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"468\" height=\"36\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-9.png\" alt=\"\" class=\"wp-image-3992\" style=\"width:656px;height:auto\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-9.png 468w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-9-300x23.png 300w\" sizes=\"(max-width: 468px) 100vw, 468px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e40\u0e1e\u0e37\u0e48\u0e2d\u0e41\u0e01\u0e49\u0e2b\u0e32<em>\u00a0<\/em>optimal policy \u0e44\u0e14\u0e49<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"468\" height=\"32\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-10.png\" alt=\"\" class=\"wp-image-3994\" style=\"width:618px;height:auto\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-10.png 468w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-10-300x21.png 300w\" sizes=\"(max-width: 468px) 100vw, 468px\" \/><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\"><strong>Actor-Critic Method<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e16\u0e49\u0e32<em>&nbsp;<\/em>q-learning \u0e40\u0e1b\u0e47\u0e19\u0e27\u0e34\u0e18\u0e35\u0e01\u0e32\u0e23\u0e17\u0e35\u0e48\u0e2d\u0e22\u0e39\u0e48\u0e43\u0e19\u0e1d\u0e31\u0e48\u0e07\u0e04\u0e48\u0e2d\u0e19\u0e44\u0e1b\u0e17\u0e32\u0e07\u0e1e\u0e37\u0e49\u0e19\u0e17\u0e32\u0e07 Actor-Critic (A2C) \u0e01\u0e47\u0e16\u0e37\u0e2d\u0e40\u0e1b\u0e47\u0e19\u0e27\u0e34\u0e18\u0e35\u0e01\u0e32\u0e23\u0e17\u0e35\u0e48\u0e04\u0e48\u0e2d\u0e19\u0e02\u0e49\u0e32\u0e07 advanced \u0e2a\u0e32\u0e21\u0e32\u0e23\u0e16\u0e23\u0e2d\u0e07\u0e23\u0e31\u0e1a\u0e42\u0e08\u0e17\u0e22\u0e4c\u0e17\u0e35\u0e48\u0e21\u0e35\u0e04\u0e27\u0e32\u0e21\u0e0b\u0e31\u0e1a\u0e0b\u0e49\u0e2d\u0e19\u0e21\u0e32\u0e01\u0e02\u0e36\u0e49\u0e19 \u0e42\u0e14\u0e22 A2C \u0e43\u0e0a\u0e49 surrogate \u0e02\u0e2d\u0e07 q-function (Critic) \u0e21\u0e32\u0e0a\u0e48\u0e27\u0e22\u0e43\u0e19\u0e01\u0e32\u0e23 optimize \u0e15\u0e31\u0e27 policy (Actor) \u0e42\u0e14\u0e22\u0e15\u0e23\u0e07<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"264\" height=\"262\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-12.png\" alt=\"\" class=\"wp-image-3996\" style=\"width:356px;height:auto\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-12.png 264w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-12-150x150.png 150w\" sizes=\"(max-width: 264px) 100vw, 264px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e02\u0e49\u0e2d\u0e44\u0e14\u0e49\u0e40\u0e1b\u0e23\u0e35\u0e22\u0e1a\u0e17\u0e35\u0e48 A2C \u0e21\u0e35\u0e40\u0e2b\u0e19\u0e37\u0e2d\u0e01\u0e27\u0e48\u0e32 Q-Learning \u0e2d\u0e22\u0e48\u0e32\u0e07\u0e0a\u0e31\u0e14\u0e40\u0e08\u0e19\u0e04\u0e37\u0e2d\u0e21\u0e31\u0e19\u0e2a\u0e32\u0e21\u0e32\u0e23\u0e16\u0e23\u0e31\u0e1a\u0e21\u0e37\u0e2d\u0e01\u0e31\u0e1a continuous action space \u0e44\u0e14\u0e49 \u0e42\u0e14\u0e22\u0e40\u0e23\u0e32\u0e08\u0e30\u0e43\u0e2b\u0e49 action \u0e01\u0e23\u0e30\u0e08\u0e32\u0e22\u0e15\u0e31\u0e27\u0e41\u0e1a\u0e1a Gaussian \u0e41\u0e25\u0e30 Actor follows Gaussian policy \u0e0b\u0e36\u0e48\u0e07\u0e40\u0e23\u0e32\u0e08\u0e30\u0e43\u0e0a\u0e49\u0e40\u0e17\u0e04\u0e19\u0e34\u0e04\u0e02\u0e2d\u0e07 deep learning \u0e43\u0e19\u0e17\u0e33\u0e19\u0e2d\u0e07\u0e40\u0e14\u0e35\u0e22\u0e27\u0e01\u0e31\u0e1a q-network \u0e42\u0e14\u0e22\u0e40\u0e23\u0e32\u0e08\u0e30\u0e40\u0e25\u0e37\u0e2d\u0e01 mean \u0e02\u0e2d\u0e07 Gaussian policy \u0e42\u0e14\u0e22\u0e43\u0e0a\u0e49 neural network (\u0e02\u0e2d\u0e40\u0e23\u0e35\u0e22\u0e01\u0e27\u0e48\u0e32 policy network)<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"468\" height=\"149\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-13.png\" alt=\"\" class=\"wp-image-3998\" style=\"width:610px;height:auto\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-13.png 468w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-13-300x96.png 300w\" sizes=\"(max-width: 468px) 100vw, 468px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e41\u0e25\u0e30\u0e43\u0e19\u0e2a\u0e48\u0e27\u0e19\u0e02\u0e2d\u0e07 Critic \u0e19\u0e31\u0e49\u0e19\u0e04\u0e37\u0e2d surrogate function \u0e02\u0e2d\u0e07 q function \u0e40\u0e23\u0e35\u0e22\u0e01\u0e27\u0e48\u0e32 advantage A(s,a)&nbsp;\u0e19\u0e34\u0e22\u0e32\u0e21\u0e42\u0e14\u0e22<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" width=\"640\" height=\"213\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-14.png\" alt=\"\" class=\"wp-image-4000\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-14.png 640w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-14-300x100.png 300w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e0b\u0e36\u0e48\u0e07\u0e40\u0e23\u0e32\u0e08\u0e30\u0e43\u0e0a\u0e49 neural network \u0e40\u0e1b\u0e47\u0e19 function approximator \u0e02\u0e2d\u0e07&nbsp;A(s,a)&nbsp;\u0e42\u0e14\u0e22\u0e15\u0e23\u0e07<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"468\" height=\"100\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-15.png\" alt=\"\" class=\"wp-image-4002\" style=\"width:624px;height:auto\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-15.png 468w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-15-300x64.png 300w\" sizes=\"(max-width: 468px) 100vw, 468px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e40\u0e21\u0e37\u0e48\u0e2d\u0e44\u0e14\u0e49\u0e17\u0e31\u0e49\u0e07 Actor \u0e41\u0e25\u0e30 Critic \u0e41\u0e25\u0e49\u0e27 \u0e40\u0e23\u0e32\u0e01\u0e47\u0e2a\u0e32\u0e21\u0e32\u0e23\u0e16\u0e01\u0e33\u0e2b\u0e19\u0e14 policy \u0e44\u0e14\u0e49<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"468\" height=\"34\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-16.png\" alt=\"\" class=\"wp-image-4004\" style=\"width:636px;height:auto\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-16.png 468w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-16-300x22.png 300w\" sizes=\"(max-width: 468px) 100vw, 468px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e43\u0e19\u0e40\u0e04\u0e2a\u0e19\u0e35\u0e49 \u0e40\u0e23\u0e32\u0e08\u0e30\u0e43\u0e0a\u0e49 algorithm \u0e17\u0e35\u0e48\u0e0a\u0e37\u0e48\u0e2d\u0e27\u0e48\u0e32 Proximal Policy Optimization (PPO) \u0e0b\u0e36\u0e48\u0e07\u0e40\u0e1b\u0e47\u0e19 policy-based RL algorithm \u0e2a\u0e33\u0e2b\u0e23\u0e31\u0e1a deep learning \u0e42\u0e14\u0e22\u0e40\u0e09\u0e1e\u0e32\u0e30<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"468\" height=\"34\" src=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-17.png\" alt=\"\" class=\"wp-image-4006\" style=\"width:632px;height:auto\" srcset=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-17.png 468w, https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/image-17-300x22.png 300w\" sizes=\"(max-width: 468px) 100vw, 468px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">\u0e40\u0e1e\u0e37\u0e48\u0e2d\u0e41\u0e01\u0e49\u0e2b\u0e32 optimal policy \u0e2d\u0e2d\u0e01\u0e21\u0e32\u0e44\u0e14\u0e49<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e14\u0e49\u0e27\u0e22\u0e04\u0e27\u0e32\u0e21\u0e17\u0e35\u0e48\u0e42\u0e08\u0e17\u0e22\u0e4c\u0e25\u0e39\u0e01\u0e15\u0e38\u0e49\u0e21\u0e40\u0e1b\u0e47\u0e19\u0e42\u0e08\u0e17\u0e22\u0e4c basic \u0e40\u0e23\u0e32\u0e08\u0e36\u0e07\u0e2d\u0e32\u0e08\u0e44\u0e21\u0e48\u0e40\u0e2b\u0e47\u0e19\u0e04\u0e27\u0e32\u0e21\u0e15\u0e48\u0e32\u0e07\u0e02\u0e2d\u0e07\u0e27\u0e34\u0e18\u0e35\u0e01\u0e32\u0e23\u0e17\u0e31\u0e49\u0e07\u0e2a\u0e2d\u0e07\u0e21\u0e32\u0e01 \u0e41\u0e15\u0e48\u0e41\u0e19\u0e48\u0e19\u0e2d\u0e19\u0e27\u0e48\u0e32\u0e43\u0e19\u0e42\u0e08\u0e17\u0e22\u0e4c\u0e17\u0e35\u0e48\u0e21\u0e35\u0e04\u0e27\u0e32\u0e21\u0e0b\u0e31\u0e1a\u0e0b\u0e49\u0e2d\u0e19\u0e21\u0e32\u0e01\u0e02\u0e36\u0e49\u0e19 policy \u0e15\u0e49\u0e2d\u0e07\u0e21\u0e35\u0e04\u0e27\u0e32\u0e21\u0e41\u0e21\u0e48\u0e19\u0e22\u0e33\u0e43\u0e19\u0e01\u0e32\u0e23\u0e01\u0e33\u0e2b\u0e19\u0e14 action \u0e43\u0e2b\u0e49\u0e44\u0e14\u0e49\u0e23\u0e30\u0e14\u0e31\u0e1a\u0e17\u0e28\u0e19\u0e34\u0e22\u0e21\u0e2b\u0e25\u0e32\u0e22\u0e15\u0e33\u0e41\u0e2b\u0e19\u0e48\u0e07\u0e19\u0e31\u0e49\u0e19 \u0e01\u0e32\u0e23\u0e40\u0e25\u0e37\u0e2d\u0e01\u0e43\u0e0a\u0e49 algorithms \u0e43\u0e2b\u0e49\u0e40\u0e2b\u0e21\u0e32\u0e30\u0e2a\u0e21\u0e08\u0e36\u0e07\u0e40\u0e1b\u0e47\u0e19\u0e2a\u0e34\u0e48\u0e07\u0e2a\u0e33\u0e04\u0e31\u0e0d\u0e21\u0e32\u0e01<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u0e2a\u0e33\u0e2b\u0e23\u0e31\u0e1a\u0e1a\u0e17\u0e04\u0e27\u0e32\u0e21\u0e19\u0e35\u0e49\u0e01\u0e47\u0e08\u0e1a\u0e25\u0e07\u0e40\u0e1e\u0e35\u0e22\u0e07\u0e40\u0e17\u0e48\u0e32\u0e19\u0e35\u0e49 \u0e2d\u0e32\u0e08\u0e08\u0e30\u0e21\u0e35\u0e40\u0e19\u0e37\u0e49\u0e2d\u0e2b\u0e32\u0e1a\u0e32\u0e07\u0e2d\u0e22\u0e48\u0e32\u0e07\u0e17\u0e35\u0e48\u0e22\u0e31\u0e07\u0e2d\u0e18\u0e34\u0e1a\u0e32\u0e22\u0e07\u0e07\u0e46\u0e1a\u0e49\u0e32\u0e07\u0e01\u0e47\u0e02\u0e2d\u0e1b\u0e23\u0e30\u0e17\u0e32\u0e19\u0e42\u0e17\u0e29\u0e25\u0e48\u0e27\u0e07\u0e2b\u0e19\u0e49\u0e32\u0e19\u0e30\u0e04\u0e23\u0e31\u0e1a \u0e2d\u0e22\u0e48\u0e32\u0e07\u0e19\u0e49\u0e2d\u0e22\u0e04\u0e34\u0e14\u0e27\u0e48\u0e32\u0e40\u0e1b\u0e47\u0e19\u0e01\u0e32\u0e23\u0e41\u0e25\u0e01\u0e40\u0e1b\u0e25\u0e35\u0e48\u0e22\u0e19\u0e01\u0e31\u0e1a\u0e1c\u0e39\u0e49\u0e2d\u0e48\u0e32\u0e19\u0e41\u0e25\u0e30\u0e41\u0e1a\u0e48\u0e07\u0e1b\u0e31\u0e19\u0e21\u0e38\u0e21\u0e21\u0e2d\u0e07\u0e17\u0e35\u0e48\u0e44\u0e14\u0e49\u0e15\u0e01\u0e1c\u0e25\u0e36\u0e01\u0e21\u0e32\u0e0b\u0e36\u0e48\u0e07\u0e2a\u0e32\u0e21\u0e32\u0e23\u0e16\u0e44\u0e1b\u0e08\u0e38\u0e14\u0e1b\u0e23\u0e30\u0e01\u0e32\u0e22\u0e04\u0e27\u0e32\u0e21\u0e04\u0e34\u0e14\u0e1a\u0e32\u0e07\u0e2d\u0e22\u0e48\u0e32\u0e07\u0e40\u0e1e\u0e34\u0e48\u0e21\u0e40\u0e15\u0e34\u0e21\u0e44\u0e14\u0e49 \u0e01\u0e47\u0e16\u0e37\u0e2d\u0e27\u0e48\u0e32\u0e1a\u0e17\u0e04\u0e27\u0e32\u0e21\u0e19\u0e35\u0e49\u0e44\u0e14\u0e49\u0e17\u0e33\u0e2b\u0e19\u0e49\u0e32\u0e17\u0e35\u0e48\u0e02\u0e2d\u0e07\u0e21\u0e31\u0e19\u0e41\u0e25\u0e49\u0e27\u0e25\u0e30\u0e04\u0e23\u0e31\u0e1a \u0e15\u0e49\u0e2d\u0e07\u0e02\u0e2d\u0e02\u0e2d\u0e1a\u0e04\u0e38\u0e13\u0e17\u0e48\u0e32\u0e19\u0e1c\u0e39\u0e49\u0e2d\u0e48\u0e32\u0e19\u0e17\u0e38\u0e01\u0e17\u0e48\u0e32\u0e19\u0e17\u0e35\u0e48\u0e2a\u0e25\u0e30\u0e40\u0e27\u0e25\u0e32\u0e2d\u0e48\u0e32\u0e19\u0e21\u0e32\u0e08\u0e19\u0e08\u0e1a\u0e14\u0e49\u0e27\u0e22\u0e04\u0e23\u0e31\u0e1a\u0e1c\u0e21<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u0e43\u0e19\u0e1a\u0e23\u0e23\u0e14\u0e32 subfields \u0e02\u0e2d\u0e07 machine learning \u0e13 \u0e15\u0e2d\u0e19\u0e19\u0e35\u0e49\u0e04\u0e07\u0e44\u0e21\u0e48\u0e21\u0e35\u0e2d\u0e30\u0e44\u0e23\u0e17\u0e35\u0e48\u0e01\u0e33\u0e25\u0e31\u0e07\u0e40\u0e1b\u0e47\u0e19\u0e01\u0e23\u0e30\u0e41\u0e2a\u0e21\u0e32\u0e01\u0e44\u0e1b\u0e01\u0e27\u0e48\u0e32 reinforce-ment learning \u0e2d\u0e35\u0e01\u0e41\u0e25\u0e49\u0e27 (\u0e15\u0e48\u0e2d\u0e44\u0e1b\u0e08\u0e30\u0e02\u0e2d\u0e40\u0e23\u0e35\u0e22\u0e01\u0e22\u0e48\u0e2d\u0e27\u0e48\u0e32 RL) \u0e0b\u0e36\u0e48\u0e07\u0e16\u0e49\u0e32\u0e40\u0e23\u0e32\u0e22\u0e49\u0e2d\u0e19\u0e01\u0e25\u0e31\u0e1a\u0e44\u0e1b\u0e0b\u0e31\u0e01 2-3 \u0e1b\u0e35\u0e17\u0e35\u0e48\u0e41\u0e25\u0e49\u0e27 \u0e15\u0e33\u0e41\u0e2b\u0e19\u0e48\u0e07\u0e19\u0e35\u0e49\u0e22\u0e31\u0e07\u0e04\u0e07\u0e40\u0e1b\u0e47\u0e19\u0e02\u0e2d\u0e07 deep learning \u0e2d\u0e22\u0e39\u0e48\u0e40\u0e25\u0e22 \u0e0a\u0e31\u0e14\u0e40\u0e08\u0e19\u0e27\u0e48\u0e32\u0e42\u0e25\u0e01\u0e40\u0e1b\u0e25\u0e35\u0e48\u0e22\u0e19\u0e40\u0e23\u0e47\u0e27\u0e41\u0e04\u0e48\u0e44\u0e2b\u0e19\u0e01\u0e31\u0e19<\/p>\n","protected":false},"featured_media":3980,"template":"","featured_blog":[],"blog_industry":[],"blog_topic":[181],"blog_service":[],"class_list":["post-4082","blog","type-blog","status-publish","has-post-thumbnail","hentry","blog_topic-biz-tech-trends-th"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Deep Reinforcement Learning \u0e41\u0e1a\u0e1a\u0e44\u0e21\u0e48 Deep - Bluebik<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/\" \/>\n<meta property=\"og:locale\" content=\"th_TH\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Reinforcement Learning \u0e41\u0e1a\u0e1a\u0e44\u0e21\u0e48 Deep - Bluebik\" \/>\n<meta property=\"og:description\" content=\"\u0e43\u0e19\u0e1a\u0e23\u0e23\u0e14\u0e32 subfields \u0e02\u0e2d\u0e07 machine learning \u0e13 \u0e15\u0e2d\u0e19\u0e19\u0e35\u0e49\u0e04\u0e07\u0e44\u0e21\u0e48\u0e21\u0e35\u0e2d\u0e30\u0e44\u0e23\u0e17\u0e35\u0e48\u0e01\u0e33\u0e25\u0e31\u0e07\u0e40\u0e1b\u0e47\u0e19\u0e01\u0e23\u0e30\u0e41\u0e2a\u0e21\u0e32\u0e01\u0e44\u0e1b\u0e01\u0e27\u0e48\u0e32 reinforce-ment learning \u0e2d\u0e35\u0e01\u0e41\u0e25\u0e49\u0e27 (\u0e15\u0e48\u0e2d\u0e44\u0e1b\u0e08\u0e30\u0e02\u0e2d\u0e40\u0e23\u0e35\u0e22\u0e01\u0e22\u0e48\u0e2d\u0e27\u0e48\u0e32 RL) \u0e0b\u0e36\u0e48\u0e07\u0e16\u0e49\u0e32\u0e40\u0e23\u0e32\u0e22\u0e49\u0e2d\u0e19\u0e01\u0e25\u0e31\u0e1a\u0e44\u0e1b\u0e0b\u0e31\u0e01 2-3 \u0e1b\u0e35\u0e17\u0e35\u0e48\u0e41\u0e25\u0e49\u0e27 \u0e15\u0e33\u0e41\u0e2b\u0e19\u0e48\u0e07\u0e19\u0e35\u0e49\u0e22\u0e31\u0e07\u0e04\u0e07\u0e40\u0e1b\u0e47\u0e19\u0e02\u0e2d\u0e07 deep learning \u0e2d\u0e22\u0e39\u0e48\u0e40\u0e25\u0e22 \u0e0a\u0e31\u0e14\u0e40\u0e08\u0e19\u0e27\u0e48\u0e32\u0e42\u0e25\u0e01\u0e40\u0e1b\u0e25\u0e35\u0e48\u0e22\u0e19\u0e40\u0e23\u0e47\u0e27\u0e41\u0e04\u0e48\u0e44\u0e2b\u0e19\u0e01\u0e31\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Bluebik\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-10T07:45:03+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/Thumbnail-Blog040.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"900\" \/>\n\t<meta property=\"og:image:height\" content=\"600\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"6 \u0e19\u0e32\u0e17\u0e35\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/bluebik.com\\\/th\\\/blog\\\/deep-reinforcement-learning\\\/\",\"url\":\"https:\\\/\\\/bluebik.com\\\/th\\\/blog\\\/deep-reinforcement-learning\\\/\",\"name\":\"Deep Reinforcement Learning \u0e41\u0e1a\u0e1a\u0e44\u0e21\u0e48 Deep - Bluebik\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/bluebik.com\\\/th\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/bluebik.com\\\/th\\\/blog\\\/deep-reinforcement-learning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/bluebik.com\\\/th\\\/blog\\\/deep-reinforcement-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/bluebik.com\\\/wp-content\\\/uploads\\\/2025\\\/03\\\/Thumbnail-Blog040.jpg\",\"datePublished\":\"2022-03-24T03:28:00+00:00\",\"dateModified\":\"2025-04-10T07:45:03+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/bluebik.com\\\/th\\\/blog\\\/deep-reinforcement-learning\\\/#breadcrumb\"},\"inLanguage\":\"th\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/bluebik.com\\\/th\\\/blog\\\/deep-reinforcement-learning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"th\",\"@id\":\"https:\\\/\\\/bluebik.com\\\/th\\\/blog\\\/deep-reinforcement-learning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/bluebik.com\\\/wp-content\\\/uploads\\\/2025\\\/03\\\/Thumbnail-Blog040.jpg\",\"contentUrl\":\"https:\\\/\\\/bluebik.com\\\/wp-content\\\/uploads\\\/2025\\\/03\\\/Thumbnail-Blog040.jpg\",\"width\":900,\"height\":600},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/bluebik.com\\\/th\\\/blog\\\/deep-reinforcement-learning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/bluebik.com\\\/th\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Blogs\",\"item\":\"https:\\\/\\\/bluebik.com\\\/th\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Deep Reinforcement Learning \u0e41\u0e1a\u0e1a\u0e44\u0e21\u0e48 Deep\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/bluebik.com\\\/th\\\/#website\",\"url\":\"https:\\\/\\\/bluebik.com\\\/th\\\/\",\"name\":\"Bluebik\",\"description\":\"Bluebik\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/bluebik.com\\\/th\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"th\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Deep Reinforcement Learning \u0e41\u0e1a\u0e1a\u0e44\u0e21\u0e48 Deep - Bluebik","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/","og_locale":"th_TH","og_type":"article","og_title":"Deep Reinforcement Learning \u0e41\u0e1a\u0e1a\u0e44\u0e21\u0e48 Deep - Bluebik","og_description":"\u0e43\u0e19\u0e1a\u0e23\u0e23\u0e14\u0e32 subfields \u0e02\u0e2d\u0e07 machine learning \u0e13 \u0e15\u0e2d\u0e19\u0e19\u0e35\u0e49\u0e04\u0e07\u0e44\u0e21\u0e48\u0e21\u0e35\u0e2d\u0e30\u0e44\u0e23\u0e17\u0e35\u0e48\u0e01\u0e33\u0e25\u0e31\u0e07\u0e40\u0e1b\u0e47\u0e19\u0e01\u0e23\u0e30\u0e41\u0e2a\u0e21\u0e32\u0e01\u0e44\u0e1b\u0e01\u0e27\u0e48\u0e32 reinforce-ment learning \u0e2d\u0e35\u0e01\u0e41\u0e25\u0e49\u0e27 (\u0e15\u0e48\u0e2d\u0e44\u0e1b\u0e08\u0e30\u0e02\u0e2d\u0e40\u0e23\u0e35\u0e22\u0e01\u0e22\u0e48\u0e2d\u0e27\u0e48\u0e32 RL) \u0e0b\u0e36\u0e48\u0e07\u0e16\u0e49\u0e32\u0e40\u0e23\u0e32\u0e22\u0e49\u0e2d\u0e19\u0e01\u0e25\u0e31\u0e1a\u0e44\u0e1b\u0e0b\u0e31\u0e01 2-3 \u0e1b\u0e35\u0e17\u0e35\u0e48\u0e41\u0e25\u0e49\u0e27 \u0e15\u0e33\u0e41\u0e2b\u0e19\u0e48\u0e07\u0e19\u0e35\u0e49\u0e22\u0e31\u0e07\u0e04\u0e07\u0e40\u0e1b\u0e47\u0e19\u0e02\u0e2d\u0e07 deep learning \u0e2d\u0e22\u0e39\u0e48\u0e40\u0e25\u0e22 \u0e0a\u0e31\u0e14\u0e40\u0e08\u0e19\u0e27\u0e48\u0e32\u0e42\u0e25\u0e01\u0e40\u0e1b\u0e25\u0e35\u0e48\u0e22\u0e19\u0e40\u0e23\u0e47\u0e27\u0e41\u0e04\u0e48\u0e44\u0e2b\u0e19\u0e01\u0e31\u0e19","og_url":"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/","og_site_name":"Bluebik","article_modified_time":"2025-04-10T07:45:03+00:00","og_image":[{"width":900,"height":600,"url":"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/Thumbnail-Blog040.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"6 \u0e19\u0e32\u0e17\u0e35"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/","url":"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/","name":"Deep Reinforcement Learning \u0e41\u0e1a\u0e1a\u0e44\u0e21\u0e48 Deep - Bluebik","isPartOf":{"@id":"https:\/\/bluebik.com\/th\/#website"},"primaryImageOfPage":{"@id":"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/#primaryimage"},"image":{"@id":"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/Thumbnail-Blog040.jpg","datePublished":"2022-03-24T03:28:00+00:00","dateModified":"2025-04-10T07:45:03+00:00","breadcrumb":{"@id":"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/#breadcrumb"},"inLanguage":"th","potentialAction":[{"@type":"ReadAction","target":["https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/"]}]},{"@type":"ImageObject","inLanguage":"th","@id":"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/#primaryimage","url":"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/Thumbnail-Blog040.jpg","contentUrl":"https:\/\/bluebik.com\/wp-content\/uploads\/2025\/03\/Thumbnail-Blog040.jpg","width":900,"height":600},{"@type":"BreadcrumbList","@id":"https:\/\/bluebik.com\/th\/blog\/deep-reinforcement-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/bluebik.com\/th\/"},{"@type":"ListItem","position":2,"name":"Blogs","item":"https:\/\/bluebik.com\/th\/blog\/"},{"@type":"ListItem","position":3,"name":"Deep Reinforcement Learning \u0e41\u0e1a\u0e1a\u0e44\u0e21\u0e48 Deep"}]},{"@type":"WebSite","@id":"https:\/\/bluebik.com\/th\/#website","url":"https:\/\/bluebik.com\/th\/","name":"Bluebik","description":"Bluebik","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/bluebik.com\/th\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"th"}]}},"_links":{"self":[{"href":"https:\/\/bluebik.com\/th\/wp-json\/wp\/v2\/blog\/4082","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bluebik.com\/th\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/bluebik.com\/th\/wp-json\/wp\/v2\/types\/blog"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/bluebik.com\/th\/wp-json\/wp\/v2\/media\/3980"}],"wp:attachment":[{"href":"https:\/\/bluebik.com\/th\/wp-json\/wp\/v2\/media?parent=4082"}],"wp:term":[{"taxonomy":"featured_blog","embeddable":true,"href":"https:\/\/bluebik.com\/th\/wp-json\/wp\/v2\/featured_blog?post=4082"},{"taxonomy":"blog_industry","embeddable":true,"href":"https:\/\/bluebik.com\/th\/wp-json\/wp\/v2\/blog_industry?post=4082"},{"taxonomy":"blog_topic","embeddable":true,"href":"https:\/\/bluebik.com\/th\/wp-json\/wp\/v2\/blog_topic?post=4082"},{"taxonomy":"blog_service","embeddable":true,"href":"https:\/\/bluebik.com\/th\/wp-json\/wp\/v2\/blog_service?post=4082"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}