gnupg/regexp/parse-unidata.awk

#
# parse-unidata.awk - generate a table (unicode_case_mapping_upper)
#
# Copyright (C) 2020 g10 Code GmbH
#
# This file is part of GnuPG.
#
# GnuPG is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# GnuPG is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <https://www.gnu.org/licenses/>.
#

# Parse the unicode data from:
#   https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
# to generate case mapping table

BEGIN {
    print("/* Generated from UnicodeData.txt */")
    print("")
    print("static const struct casemap unicode_case_mapping_upper[] = {")
    FS = ";"
    count = 0
}

{
    code = int("0x" $1)
    name = $2
    class = $3
    upper = $13
    lower = $14
    title = $15

    if (code <= 127) {
	next
    }
    if (code > 65535) {
	next
    }
    if ($3 !~ /^L.*/) {
	next
    }
    if (upper != "") {
	printf("\t{ 0x" tolower($1) ", 0x" tolower(upper) " },")
	count++
	if ((count % 4) == 0) {
	    print("")
	}
    }
}

END {
    print("\n};")
}
gpg: Add regular expression support. * AUTHORS, COPYING.other: Update. * Makefile.am (SUBDIRS): Add regexp sub directory. * configure.ac (DISABLE_REGEX): Remove. * g10/Makefile.am (needed_libs): Add libregexp.a. * g10/trustdb.c: Remove DISABLE_REGEX support. * regexp/LICENSE, regexp/jimregexp.c, regexp/jimregexp.h, regexp/utf8.c, regexp/utf8.h: New from Jim Tcl. * regexp/UnicodeData.txt: New from Unicode. * regexp/Makefile.am, regexp/parse-unidata.awk: New. * tests/openpgp/Makefile.am: Remove DISABLE_REGEX support. * tools/Makefile.am: Remove DISABLE_REGEX support. -- Backport master commit of: ba247a114c75a84473c11c1484013b09fbb9bcd1 GnuPG-bug-id: 4843 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org> 2020-04-03 15:30:08 +09:00			`#`
			`# parse-unidata.awk - generate a table (unicode_case_mapping_upper)`
			`#`
			`# Copyright (C) 2020 g10 Code GmbH`
			`#`
			`# This file is part of GnuPG.`
			`#`
			`# GnuPG is free software; you can redistribute it and/or modify`
			`# it under the terms of the GNU General Public License as published by`
			`# the Free Software Foundation; either version 3 of the License, or`
			`# (at your option) any later version.`
			`#`
			`# GnuPG is distributed in the hope that it will be useful,`
			`# but WITHOUT ANY WARRANTY; without even the implied warranty of`
			`# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
			`# GNU General Public License for more details.`
			`#`
			`# You should have received a copy of the GNU General Public License`
			`# along with this program; if not, see <https://www.gnu.org/licenses/>.`
			`#`

			`# Parse the unicode data from:`
			`# https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt`
			`# to generate case mapping table`

			`BEGIN {`
			`print("/* Generated from UnicodeData.txt */")`
			`print("")`
			`print("static const struct casemap unicode_case_mapping_upper[] = {")`
			`FS = ";"`
			`count = 0`
			`}`

			`{`
regexp: Fix generation of _unicode_mapping.c. * configure.ac (AWK_HEX_NUMBER_OPTION): Detect GNU Awk. * regexp/Makefile.am: Use AWK_HEX_NUMBER_OPTION. * regexp/parse-unidata.awk: Don't use strtonum. -- Backport master commit of: 50b320952e99ea20f9b77c6c501280fe37fd2598 GnuPG-bug-id: 4915 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org> 2020-04-15 14:10:08 +09:00			`code = int("0x" $1)`
gpg: Add regular expression support. * AUTHORS, COPYING.other: Update. * Makefile.am (SUBDIRS): Add regexp sub directory. * configure.ac (DISABLE_REGEX): Remove. * g10/Makefile.am (needed_libs): Add libregexp.a. * g10/trustdb.c: Remove DISABLE_REGEX support. * regexp/LICENSE, regexp/jimregexp.c, regexp/jimregexp.h, regexp/utf8.c, regexp/utf8.h: New from Jim Tcl. * regexp/UnicodeData.txt: New from Unicode. * regexp/Makefile.am, regexp/parse-unidata.awk: New. * tests/openpgp/Makefile.am: Remove DISABLE_REGEX support. * tools/Makefile.am: Remove DISABLE_REGEX support. -- Backport master commit of: ba247a114c75a84473c11c1484013b09fbb9bcd1 GnuPG-bug-id: 4843 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org> 2020-04-03 15:30:08 +09:00			`name = $2`
			`class = $3`
			`upper = $13`
			`lower = $14`
			`title = $15`

regexp: Fix generation of _unicode_mapping.c. * configure.ac (AWK_HEX_NUMBER_OPTION): Detect GNU Awk. * regexp/Makefile.am: Use AWK_HEX_NUMBER_OPTION. * regexp/parse-unidata.awk: Don't use strtonum. -- Backport master commit of: 50b320952e99ea20f9b77c6c501280fe37fd2598 GnuPG-bug-id: 4915 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org> 2020-04-15 14:10:08 +09:00			`if (code <= 127) {`
gpg: Add regular expression support. * AUTHORS, COPYING.other: Update. * Makefile.am (SUBDIRS): Add regexp sub directory. * configure.ac (DISABLE_REGEX): Remove. * g10/Makefile.am (needed_libs): Add libregexp.a. * g10/trustdb.c: Remove DISABLE_REGEX support. * regexp/LICENSE, regexp/jimregexp.c, regexp/jimregexp.h, regexp/utf8.c, regexp/utf8.h: New from Jim Tcl. * regexp/UnicodeData.txt: New from Unicode. * regexp/Makefile.am, regexp/parse-unidata.awk: New. * tests/openpgp/Makefile.am: Remove DISABLE_REGEX support. * tools/Makefile.am: Remove DISABLE_REGEX support. -- Backport master commit of: ba247a114c75a84473c11c1484013b09fbb9bcd1 GnuPG-bug-id: 4843 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org> 2020-04-03 15:30:08 +09:00			`next`
			`}`
regexp: Fix generation of _unicode_mapping.c. * configure.ac (AWK_HEX_NUMBER_OPTION): Detect GNU Awk. * regexp/Makefile.am: Use AWK_HEX_NUMBER_OPTION. * regexp/parse-unidata.awk: Don't use strtonum. -- Backport master commit of: 50b320952e99ea20f9b77c6c501280fe37fd2598 GnuPG-bug-id: 4915 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org> 2020-04-15 14:10:08 +09:00			`if (code > 65535) {`
gpg: Add regular expression support. * AUTHORS, COPYING.other: Update. * Makefile.am (SUBDIRS): Add regexp sub directory. * configure.ac (DISABLE_REGEX): Remove. * g10/Makefile.am (needed_libs): Add libregexp.a. * g10/trustdb.c: Remove DISABLE_REGEX support. * regexp/LICENSE, regexp/jimregexp.c, regexp/jimregexp.h, regexp/utf8.c, regexp/utf8.h: New from Jim Tcl. * regexp/UnicodeData.txt: New from Unicode. * regexp/Makefile.am, regexp/parse-unidata.awk: New. * tests/openpgp/Makefile.am: Remove DISABLE_REGEX support. * tools/Makefile.am: Remove DISABLE_REGEX support. -- Backport master commit of: ba247a114c75a84473c11c1484013b09fbb9bcd1 GnuPG-bug-id: 4843 Signed-off-by: NIIBE Yutaka <gniibe@fsij.org> 2020-04-03 15:30:08 +09:00			`next`
			`}`
			`if ($3 !~ /^L.*/) {`
			`next`
			`}`
			`if (upper != "") {`
			`printf("\t{ 0x" tolower($1) ", 0x" tolower(upper) " },")`
			`count++`
			`if ((count % 4) == 0) {`
			`print("")`
			`}`
			`}`
			`}`

			`END {`
			`print("\n};")`
			`}`