Skip to content
Snippets Groups Projects
Commit 4cd66fc4 authored by Andri Joos's avatar Andri Joos :blush:
Browse files

add start policy

parent e84e5b26
No related branches found
No related tags found
1 merge request!44Resolve "Document learning curves"
\section{Policies}
To each environment, there exists a corresponding policy.
\input{02_product-documentation/04_implementation/02_policies/01_start_policy}
\subsection{Start policy}
The start policy handles the takeoff of the drone.
However, since the takeoff and landing are very similar tasks and the start policy generalizes quite good, the start policy can also be used for the
landing task.
\subsubsection{Hyperparameters}
\begin{lstlisting}[language=python]
num_episodes = 10000 # ensures training isn't stopped too early
num_steps_per_episode = 2000
# ensures the policy reaching its target is in the replay
# buffer, 10000 was too little, as only the last 5 episodes
# were kept in the replay buffer
replay_buffer_capacity = 100000
# simple tasks with large learning rate require large
# batch size to prevent loss getting 0
batch_size = 1024
# largest possible value
# larger learning rate causes policy to not solve task
critic_learning_rate = 3e-4
actor_learning_rate = 3e-4
alpha_learning_rate = 3e-4
# simple task -> discount can be quite large / gamma quite low
# fine-adjusted through trial and error
gamma = 0.9
# simple task -> few neurons
# z velocity requires a second layer
actor_fc_layer_params = (16,16)
critic_joint_fc_layer_params = actor_fc_layer_params
# the following hyperparameters are from utils.py
target_update_tau = 0.005
target_update_period = 1
\end{lstlisting}
The parameters \lstinline[language=python]{target_update_tau} and \lstinline[language=python]{target_update_period} are explained in
\autoref{sec:implementation/networks-target-networks}.
They only affect the \hyperref[sec:implementation/critic-network]{critic network}.
\subsubsection{Learning curve}
\begin{figure}[H]
\centering
\includegraphics[width=\linewidth]{implementation/learning_curves/start_policy.png}
\caption{Learning curve start policy}
\end{figure}
In the opaque, smoothened curve, is is clearly visible how the policy approaches the optimum, which is approximately at $-1 * 10^4$.
The partially transparent, not smoothened curve shows even shows good results at 70K steps, but, while testing the policy, I have noted, that the
policy is still not good in some cases.
Therefore, the smoothened curve is a better indicator for the policy successfully solving the task.
\subsubsection{Demo}
A demonstration video of the drone starting is available \href{https://cloud.joos.io/index.php/s/QRpNi2WnzRt6XrE}{here}.
A demonstration video of the drone landing is available \href{https://cloud.joos.io/index.php/s/tRC7DETDXFo8mPf}{here}.
src/resources/implementation/learning_curves/start_policy.png

78.4 KiB

0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment