# Documents

A document is an object composed of one or more fields. Each field consists of an attribute and its associated value.

Documents function as containers for organizing data, and are the basic building blocks of a Meilisearch database. To search for a document, it must first be added to an index.

# Structure

Diagram illustration Meilisearch's document structure

# Important terms

  • Document: an object which contains data in the form of one or more fields
  • Field: a set of two data items that are linked together: an attribute and a value
  • Attribute: the first part of a field. Acts as a name or description for its associated value.
  • Value: the second part of a field, consisting of data of any valid JSON type
  • Primary Field: A special field that is mandatory in all documents. It contains the primary key and document identifier.
  • Primary Key: the attribute of the primary field. All documents in the same index must possess the same primary key. Its associated value is the document identifier.
  • Document Identifier: the value of the primary field. Every document in a given index must have a unique identifier.

# Dataset format

You can provide your dataset in the following formats:

# JSON

Documents represented as JSON objects are key-value pairs enclosed by curly brackets. As such, any rule that applies to formatting JSON objects (opens new window) also applies to formatting Meilisearch documents. For example, an attribute must be a string, while a value must be a valid JSON data type (opens new window).

As an example, let's say you are creating an index that contains information about movies. A sample document might look like this:

{
  "id": "1564saqw12ss",
  "title": "Kung Fu Panda",
  "genres": "Children's Animation",
  "release-year": 2008,
  "cast": [
    { "Jack Black": "Po" },
    { "Jackie Chan": "Monkey" }
  ]
}

In the above example, "id", "title", "genres", "release-year", and "cast" are attributes.
Each attribute must be associated with a value, e.g. "Kung Fu Panda" is the value of "title".
At minimum, the document must contain one field with the primary key attribute and a unique document id as its value. Above, that's: "id": "1564saqw12ss".

# NDJSON

NDJSON objects consist of individual lines where each individual line is valid JSON text and each line is delimited with a newline character. Any rules that apply to formatting NDJSON (opens new window) also apply to Meilisearch documents.

Compared to JSON, NDJSON has better writing performance and is less CPU and memory intensive. It is easier to validate and, unlike CSV, can handle nested structures.

The above JSON document would look like this in NDJSON:

{ 
  "id": "1564saqw12ss", 
  "title": "Kung Fu Panda", 
  "genres": "Children's Animation", 
  "release-year": 2008, 
  "cast": [
    { "Jack Black": "Po" },
    { "Jackie Chan": "Monkey" }
  ]
}

# CSV

CSV files express data as a sequence of values separated by a delimiter character. Currently, Meilisearch only supports the comma (,) delimiter. Any rules that apply to formatting CSV (opens new window) also apply to Meilisearch documents.

Compared to JSON, CSV has better writing performance and is less CPU and memory intensive.

The above JSON document would look like this in CSV:

  "id:string","title:string","genres:string","release-year:number"
  "1564saqw12ss","Kung Fu Panda","Children's Animation","2008"

Since CSV does not support arrays or nested objects, cast cannot be converted to CSV.

TIP

If you don't specify the data type for an attribute, it will default to :string.

# Limitations and requirements

Documents have a soft maximum of 1000 fields; beyond that the ranking rules may no longer be effective, leading to undefined behavior.

Additionally, every document must have at minimum one field containing the primary key and a unique id.

If you try to index a document that's incorrectly formatted, missing a primary key, or possessing the wrong primary key for a given index, it will cause an error and no documents will be added.

# Fields

A field is a set of two data items linked together: an attribute and a value. Documents are made up of fields.

An attribute functions a bit like a variable in most programming languages, i.e. it is a name that allows you to store, access, and describe some data. That data is the attribute's value.

Every field has a data type dictated by its value. Every value must be a valid JSON data type (opens new window).

If a field contains an object, you can refer directly to its internal properties using dot notation: attributeA.objectKeyA. Dot notation also works with nested objects: attributeA.objectKeyA.objectKeyB. This syntax is supported across Meilisearch, including index settings and search parameters.

Take note that, in the case of strings, a value can contain at most 65535 positions. Words exceeding the 65535 position limit will be ignored.

You can also apply ranking rules to some fields. For example, you may decide recent movies should be more relevant than older ones.

If you would like to adjust how a field gets handled by Meilisearch, you can do so in the settings.

# Field properties

A field may also possess field properties. Field properties determine the characteristics and behavior of the data added to that field.

At this time, there are two field properties: searchable and displayed. A field can have one, both, or neither of these properties. By default, all fields in a document are both displayed and searchable.

To clarify, a field may be:

  • Searchable but not displayed
  • Displayed but not searchable
  • Both displayed and searchable (default)
  • Neither displayed nor searchable

In the latter case, the field will be completely ignored when a search is performed. However, it will still be stored in the document.

# Primary field

The primary field is a special field that must be present in all documents. Its attribute is the primary key and its value is the document id.

To learn more, refer to the primary key explanation.

# Upload

By default, Meilisearch limits the size of all payloads—and therefore document uploads—to 100MB.

To upload more documents in one go, it is possible to change the payload size limit at runtime using the http-payload-size-limit option.

./meilisearch --http-payload-size-limit=1048576000

The above code sets the payload limit to 1GB, instead of the 100MB default.

Meilisearch uses a lot of RAM when indexing documents. Be aware of your RAM availability as you increase the size of your batch as this could cause Meilisearch to crash.

When using the route to add new documents, all documents must be sent in an array even if there is only one document.

curl \
  -X POST `http://localhost:7700/indexes/movies/documents` \
  -H 'Content-Type: application/json' \
  --data-binary '[
    {
      "movie_id": "123sq178",
      "title": "Amelie Poulain"
    }
  ]'