Commit Graph

8750 Commits

Author SHA1 Message Date
Clémentine Urquizar
2c5c79d68e
Update Tokenizer version to v0.2.1 2021-04-14 18:54:04 +02:00
Clément Renault
c2df51aa95
Merge pull request #156 from meilisearch/stop-words
Stop words
2021-04-14 17:33:06 +02:00
bors[bot]
6359a08cfe
Merge #139
139: Fix commit date & SHA in startup message r=MarinPostma a=shekhirin

Resolves https://github.com/meilisearch/transplant/issues/137
Resolves https://github.com/meilisearch/transplant/issues/138

---
I ran a GitHub Action towards my own dockerhub: https://github.com/shekhirin/transplant/actions/runs/732666353

Startup message now shows correct `Commit SHA` and `Commit date` (changed from `Build date`).
```console
➜ transplant (shekhirin/startup-git-vars) ✔ docker run -it -p 7700:7700 shekhirin/meilisearch:v0.21.0-alpha.2 ./meilisearch --no-analytics=true
Unable to find image 'shekhirin/meilisearch:v0.21.0-alpha.2' locally
v0.21.0-alpha.2: Pulling from shekhirin/meilisearch
bfdacc68c91b: Already exists 
73b1ed30fa0b: Pull complete 
6607217ed754: Pull complete 
Digest: sha256:31bd6ac37e8711ab9d4123cf2ba2f942686569f08d68cfed8643752f381bfb74
Status: Downloaded newer image for shekhirin/meilisearch:v0.21.0-alpha.2

888b     d888          d8b 888 d8b  .d8888b.                                    888
8888b   d8888          Y8P 888 Y8P d88P  Y88b                                   888
88888b.d88888              888     Y88b.                                        888
888Y88888P888  .d88b.  888 888 888  "Y888b.    .d88b.   8888b.  888d888 .d8888b 88888b.
888 Y888P 888 d8P  Y8b 888 888 888     "Y88b. d8P  Y8b     "88b 888P"  d88P"    888 "88b
888  Y8P  888 88888888 888 888 888       "888 88888888 .d888888 888    888      888  888
888   "   888 Y8b.     888 888 888 Y88b  d88P Y8b.     888  888 888    Y88b.    888  888
888       888  "Y8888  888 888 888  "Y8888P"   "Y8888  "Y888888 888     "Y8888P 888  888

Database path:          "./data.ms"
Server listening on:    "http://0.0.0.0:7700"
Environment:            "development"
Commit SHA:             "038f1c740198f974743ba87fce7b74a8d0b71b5c"
Commit date:            "2021-04-09"
Package version:        "0.21.0-alpha.2"
Sentry DSN:             "https://5ddfa22b95f241198be2271aaf028653@sentry.io/3060337"
Anonymous telemetry:    "Disabled"

No master key found; The server will accept unidentified requests. If you need some protection in development mode, please export a key: export MEILI_MASTER_KEY=xxx

Documentation:          https://docs.meilisearch.com
Source code:            https://github.com/meilisearch/meilisearch
Contact:                https://docs.meilisearch.com/resources/contact.html or bonjour@meilisearch.com

[2021-04-09T10:29:49Z INFO  actix_server::builder] Starting 2 workers
[2021-04-09T10:29:49Z INFO  actix_server::builder] Starting "actix-web-service-0.0.0.0:7700" service on 0.0.0.0:7700
[2021-04-09T10:29:49Z INFO  meilisearch_http::index_controller::uuid_resolver::actor] uuid resolver started
[2021-04-09T10:29:49Z INFO  meilisearch_http::index_controller::update_actor::actor] Started update actor.
```

Endpoint also works as expected (`buildDate` -> `commitDate`)
```console
➜ transplant (shekhirin/startup-git-vars) ✔ curl http://localhost:7700/version
{"commitSha":"038f1c740198f974743ba87fce7b74a8d0b71b5c","commitDate":"2021-04-09","pkgVersion":"0.21.0-alpha.2"}
```

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-04-13 17:38:47 +00:00
Alexey Shekhirin
f87afbc558
fix(http): commit date & SHA in startup message 2021-04-13 20:16:18 +03:00
bors[bot]
8df5f73706
Merge #133
133: Implement stats route r=MarinPostma a=shekhirin

Resolves https://github.com/meilisearch/transplant/issues/73

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-04-13 17:03:33 +00:00
Alexey Shekhirin
9eaf048a06
fix(http): use BTreeMap instead of HashMap to preserve stats order 2021-04-13 11:59:07 +03:00
tamo
dcb00b2e54
test a new implementation of the stop_words 2021-04-12 18:35:33 +02:00
tamo
da036dcc3e
Revert "Integrate the stop_words in the querytree"
This reverts commit 12fb509d84.
We revert this commit because it's causing the bug #150.
The initial algorithm we implemented for the stop_words was:

1. remove the stop_words from the dataset
2. keep the stop_words in the query to see if we can generate new words by
   integrating typos or if the word was a prefix
=> This was causing the bug since, in the case of “The hobbit”, we were
   **always** looking for something starting with “t he” or “th e”
   instead of ignoring the word completely.

For now we are going to fix the bug by completely ignoring the
stop_words in the query.
This could cause another problem were someone mistyped a normal word and
ended up typing a stop_word.

For example imagine someone searching for the music “Won't he do it”.
If that person misplace one space and write “Won' the do it” then we
will loose a part of the request.

One fix would be to update our query tree to something like that:

---------------------
OR
  OR
    TOLERANT hobbit # the first option is to ignore the stop_word
    AND
      CONSECUTIVE   # the second option is to do as we are doing
        EXACT t	    # currently
        EXACT he
      TOLERANT hobbit
---------------------

This would increase drastically the size of our query tree on request
with a lot of stop_words. For example think of “The Lord Of The Rings”.

For now whatsoever we decided we were going to ignore this problem and consider
that it doesn't reduce too much the relevancy of the search to do that
while it improves the performances.
2021-04-12 18:35:33 +02:00
Clément Renault
f9eab6e0de
Merge pull request #151 from meilisearch/release-drafter
Add release drafter files
2021-04-12 10:25:52 +02:00
Clémentine Urquizar
6a128d4ec7
Add release drafter files 2021-04-12 10:18:39 +02:00
Clément Renault
5efe67f375
Merge pull request #154 from shekhirin/shekhirin/fix-settings-serde-tests
test(http): fix and refactor settings assert_(ser|de)_tokens
2021-04-11 10:52:38 +02:00
Alexey Shekhirin
3af8fa194c
test(http): combine settings assert_(ser|de)_tokens into 1 test 2021-04-10 12:13:59 +03:00
Clément Renault
0d09c64dde
Merge pull request #148 from shekhirin/shekhirin/setting-enum
refactor(http, update): introduce setting enum
2021-04-09 22:48:58 +02:00
Alexey Shekhirin
adfdb99abc
feat(http): calculate updates' and uuids' dbs size 2021-04-09 15:59:12 +03:00
Alexey Shekhirin
ae1655586c
fixes after review 2021-04-09 14:40:48 +03:00
Alexey Shekhirin
698a1ea582
feat(http): store processing as RwLock<Option<Uuid>> in index_actor 2021-04-09 14:34:43 +03:00
Alexey Shekhirin
87412f63ef
feat(http): implement is_indexing for stats 2021-04-09 14:34:42 +03:00
Alexey Shekhirin
09d9a29176
test(http): server & index stats 2021-04-09 14:34:42 +03:00
Alexey Shekhirin
dd9eae8c26
feat(http): stats route 2021-04-09 14:34:42 +03:00
bors[bot]
a1d04fbff5
Merge #136
136: Rename update status "pending" into "enqueued" r=curquiza a=curquiza

Closes #107 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-08 16:46:12 +00:00
bors[bot]
dd1a08087b
Merge #134
134: fix(http, index): init analyzer with optional stop words r=MarinPostma a=shekhirin

Also bump `milli` and `meilisearch-tokenizer` packages versions

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-04-08 16:13:15 +00:00
Alexey Shekhirin
51ba1bd7d3
fix(http, index): init analyzer with optional stop words
Next release

update tokenizer
2021-04-08 17:16:13 +03:00
Alexey Shekhirin
84c1dda39d
test(http): setting enum serialize/deserialize 2021-04-08 17:03:40 +03:00
Alexey Shekhirin
dc636d190d
refactor(http, update): introduce setting enum 2021-04-08 17:03:40 +03:00
bors[bot]
f881e8691e
Merge #135
135: Add stop words r=curquiza a=irevoire

closes #21 

Co-authored-by: tamo <tamo@meilisearch.com>
2021-04-08 11:29:00 +00:00
bors[bot]
94c0858c27
Merge #1327
1327: Update link after branch renaming r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-08 05:47:20 +00:00
Clémentine Urquizar
6aaa4a8e19
Update link after branch renaming 2021-04-07 19:47:48 +02:00
Clémentine Urquizar
cb23775d18
Rename pending into enqueued 2021-04-07 19:46:36 +02:00
bors[bot]
0344cf5874
Merge #122
122: Update display r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-07 12:33:25 +00:00
bors[bot]
4a1b033765
Merge #1318
1318: Update README.md for contributions r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-06 23:11:29 +00:00
tamo
dcd60a5b45
add more tests for the stop_words 2021-04-06 18:29:38 +02:00
tamo
b1962c8e02
remove legacy files from meilisearch that have been replaced by a macro in routes/settings/mod.rs 2021-04-06 16:29:04 +02:00
tamo
40ef9a3c6a
push a first implementation of the stop_words 2021-04-06 16:29:04 +02:00
Clément Renault
2bcdd8844c
Merge pull request #141 from meilisearch/reorganize-criterion
reorganize criterion
2021-04-01 19:50:16 +02:00
tamo
0a4bde1f2f
update the default ordering of the criterion 2021-04-01 19:45:31 +02:00
Clément Renault
ee3f93c029
Merge pull request #136 from shekhirin/index-fields-ids-distribution-cache
feat(index): store fields distribution in index
2021-04-01 18:36:21 +02:00
Alexey Shekhirin
2658c5c545
feat(index): update fields distribution in clear & delete operations
fixes after review

bump the version of the tokenizer

implement a first version of the stop_words

The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests

Integrate the stop_words in the querytree

remove the stop_words from the querytree except if it was a prefix or a typo

more fixes after review
2021-04-01 19:12:35 +03:00
Alexey Shekhirin
27c7ab6e00
feat(index): store fields distribution in index 2021-04-01 18:35:19 +03:00
bors[bot]
2206a44baf
Merge #132
132: Next release (alpha2) r=MarinPostma a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-04-01 15:25:45 +00:00
Clémentine Urquizar
4ee6ce7871
Next release 2021-04-01 17:16:16 +02:00
bors[bot]
6cb8052d3d
Merge #104
104: Update all the response format (issue #64) r=MarinPostma a=irevoire

closes #64 

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: tamo <tamo@meilisearch.com>
2021-04-01 14:22:57 +00:00
tamo
73973e2b9e
fix more settings routes 2021-04-01 15:50:45 +02:00
bors[bot]
89e05fc6c5
Merge #113
113: snapshots r=MarinPostma a=MarinPostma

 This pr adds support for snapshoting.

The snapshoting process for an index requires that no other update is processing at the same time. A mutex lock has been added to prevent a snapshot from occuring at the same time as an update, while still premitting updates to be pushed.

The list of the indexes to snapshot is first retrieved from the `UuidResolver` which also performs its snapshot.

This list is passed to the update store, which attempts to acquire a lock on the update store while it snaphots itself and it's associated index store.

 This means that a snapshot can only be completed once all indexes have finished their ongoing update.

This pr also adds refactoring of the code to allow unit testing and mocking, and unit test the snapshot creation.

Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: tamo <irevoire@protonmail.ch>
Co-authored-by: marin <postma.marin@protonmail.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-04-01 13:16:00 +00:00
Marin Postma
248e9b3808
Merge remote-tracking branch 'origin/main' into snapshots 2021-04-01 15:10:33 +02:00
Clément Renault
67e25f8724
Merge pull request #128 from meilisearch/stop-words
Stop words
2021-04-01 14:02:37 +02:00
tamo
12fb509d84
Integrate the stop_words in the querytree
remove the stop_words from the querytree except if it was a prefix or a typo
2021-04-01 13:57:55 +02:00
tamo
a2f46029c7
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
tamo
62a8f1d707
bump the version of the tokenizer 2021-04-01 13:49:22 +02:00
tamo
79c63049d7
update the settings routes 2021-04-01 11:52:26 +02:00
Irevoire
96cffeab1e
update all the response format to be ISO with meilisearch, see #64 2021-04-01 11:43:03 +02:00