Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 23 additions & 19 deletions en/operations/data-management.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,34 @@
---

<p>
This guide documents how to export data from a Vespa cloud application and how to do mass updates or removals.
See <a href="cloning">cloning applications and data</a>
for how to copy documents from one application to another.
This guide covers data management operations for Vespa Cloud applications,
including automated backups, document export, feed, and bulk updates and removals.
</p>

<h2 id="backup">Automated Backups</h2>
<p>
Prerequisite: Use the latest version of the <a href="../clients/vespa-cli.html">vespa</a>
command-line client.
On commercial and enterprise plans, content clusters are automatically backed up when a
<a href="../reference/applications/deployment.html#backup"><code>&lt;backup&gt;</code></a> element is specified in <em>deployment.xml</em>.
Vespa Cloud manages the backup schedule, storage, and lifecycle with no external tooling required. Backups will run at the configured frequency
while also respecting any <a href="../reference/applications/deployment.html#block-change">block windows</a> defined for the instance.
</p>
<pre>{% highlight xml %}
<instance id="default">
<backup frequency="7d" />
<prod>
<region>aws-us-east-1c</region>
</prod>
</instance>
{% endhighlight %}</pre>
<p>
If you prefer to manage backups yourself, documents can be exported manually using
<code>vespa visit</code> as shown in the <a href="https://github.com/vespa-engine/sample-apps/tree/master/examples/google-cloud/cloud-functions#backup---experimental">
Google Cloud Function example</a>.
</p>



<h2 id="export-documents">Export documents</h2>
{% include note.html content='The examples below use the <a href="../clients/vespa-cli.html">Vespa CLI</a>. Ensure you have the latest version installed.' %}
<p>
To export documents, configure the application to export from,
then select zone, container cluster and schema - example:
Expand All @@ -45,21 +61,9 @@ <h2 id="export-documents">Export documents</h2>
Note that this normally does not speed up the exporting process, as the same amount of data is read from the index.
The data transfer out of the Vespa application is smaller with fewer fields.
</p>



<h2 id="backup">Backup</h2>
<p>
Use the <em>visit</em> operations above to extract documents for backup.
</p>
<p>
To back up documents to your own Google Cloud Storage, see
<a href="https://github.com/vespa-engine/sample-apps/tree/master/examples/google-cloud/cloud-functions#backup---experimental">
backup</a> for a Google Cloud Function example.
For copying documents between applications, see <a href="cloning">cloning applications and data</a>.
</p>
<!-- ToDo: this is WIP and AWS coming soon. -->



<h2 id="feed">Feed</h2>
<p>
Expand Down
61 changes: 59 additions & 2 deletions en/reference/applications/deployment.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
days="mon,wed-fri"
hours="16-23"
time-zone="UTC" />
<backup frequency="7d" granularity="cluster" />
<prod>
<region>aws-us-east-1c</region>
<delay hours="3" minutes="7" seconds="13" />
Expand Down Expand Up @@ -249,6 +250,62 @@ <h2 id="block-change">block-change</h2>



<h2 id="backup">backup</h2>
<p>
In <code>&lt;deployment&gt;</code>, <strong>or</strong> <code>&lt;instance&gt;</code>.
Configures scheduled backups of production content clusters. When present, backups will
be created at the specified frequency. Must be placed after any <code>&lt;test&gt;</code> and <code>&lt;staging&gt;</code> tags,
and before <code>&lt;prod&gt;</code>.
</p>
<table class="table">
<thead>
<tr>
<th style="width:150px">Attribute</th>
<th style="width:100px">Mandatory</th>
<th>Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>frequency</td>
<td>Yes</td>
<td>A positive integer with a suffix <code>h</code> (hours) or <code>d</code> (days),
e.g. <code>12h</code> or <code>7d</code>. Minimum 1h.</td>
</tr>
<tr>
<td>granularity</td>
<td>No, default <code>cluster</code></td>
<td>
<ul>
<li><code>cluster</code>: all content nodes in the cluster</li>
<li><code>group</code>: all content nodes in a single group</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>
Backup activity does not affect service availability, but has costs in terms of performance. You can use <code>granularity</code>
to control the tradeoff between backup and restoration speed.
<ul>
<li>
A <code>cluster</code> backup will take longer,
as each content node must be temporarily suspended to ensure data integrity. Restoration will however require
effectively zero content redistribution.
</li>
<li>
A <code>group</code> backup will be faster, as an entire group will be suspended and backed up simultaneously.
Restoration may however require a significant amount of content redistribution, depending on the cluster topology.
</li>
</ul>
<br/>
In most situations we recommend <code>cluster</code> backups.
</p>
<p>
<a href="#block-change">Block windows</a> also prevent new backups from starting in the given period.
If the available time is too short for a full backup to complete, the process will, however, extend into the block window.
</p>

<h2 id="upgrade">upgrade</h2>
<p>
In <code>&lt;deployment&gt;</code>, or <code>&lt;instance&gt;</code>.
Expand Down Expand Up @@ -479,7 +536,7 @@ <h2 id="dev">dev</h2>
<p>
In <code>&lt;deployment&gt;</code>.
Optionally used to control deployment settings for the <a href="../../operations/environments.html">dev environment</a>.
This can be used specify a different cloud account, tags, and private endpoints.
This can be used specify a different cloud account, tags, and private endpoints.
</p>
<table class="table">
<thead>
Expand Down Expand Up @@ -600,7 +657,7 @@ <h2 id="endpoints-global">endpoints (global)</h2>
<h2 id="endpoints-dev">endpoints (dev)</h2>
<p>
In <code>&lt;dev&gt;</code>. This allows
<a href="#endpoint-zone">zone endpoint</a>
<a href="#endpoint-zone">zone endpoint</a>
elements for cloud-native private network configuration for
<a href="../../operations/environments.html#dev">dev</a> deployments.
Note that <a href="#endpoint-private">private endpoints</a> are only supported in <code>prod</code>.
Expand Down