Skip to content

StarlangSoftware/AnnotatedTree-Php

Repository files navigation

Constituency TreeBank

A treebank is a corpus where the sentences in each language are syntactically (if necessary morphologically) annotated. In the treebanks, the syntactic annotation usually follows constituent and/or dependency structure.

Treebanks annotated for the syntactic or semantic structures of the sentences are essential for developing state-of-the-art statistical natural language processing (NLP) systems including part-of-speech-taggers, syntactic parsers, and machine translation systems. There are two main groups of syntactic treebanks, namely treebanks annotated for constituency (phrase structure) and the ones that are annotated for dependency structure.

Data Format

We extend the original format with the relevant information, given between curly braces. For example, the word 'problem' in a sentence in the standard Penn Treebank notation, may be represented in the data format provided below:

(NN problem)

After all levels of processing are finished, the data structure stored for the same word has the following form in the system.

(NN {turkish=sorunu} {english=problem} 
{morphologicalAnalysis=sorun+NOUN+A3SG+PNON+ACC}
{metaMorphemes=sorun+yH}
{semantics=TUR10-0703650})

As is self-explanatory, 'turkish' tag shows the original Turkish word; 'morphologicalanalysis' tag shows the correct morphological parse of that word; 'semantics' tag shows the ID of the correct sense of that word; 'namedEntity' tag shows the named entity tag of that word; 'propbank' tag shows the semantic role of that word for the verb synset id (frame id in the frame file) which is also given in that tag.

Annotated TreeBanks

Penn-Treebank (15 Words)

Penn-Treebank (20 Words)

Video Lectures

For Developers

You can also see Java, Python, Cython, C++, C, Swift, Js, or C# repository.

For Contibutors

composer.json file

  1. autoload is important when this package will be imported.
  "autoload": {
    "psr-4": {
      "olcaytaner\\WordNet\\": "src/"
    }
  },
  1. Dependencies should be maximum (not only direct but also indirect references should also be given), everything directly in the code should be given here.
  "require-dev": {
    "phpunit/phpunit": "11.4.0",
    "olcaytaner/dictionary": "1.0.0",
    "olcaytaner/xmlparser": "1.0.1",
    "olcaytaner/morphologicalanalysis": "1.0.0"
  }

Data files

  1. Add data files to the project folder. Subprojects should include all data files of the parent projects.

Php files

  1. Do not forget to comment each function.
    /**
     * Returns true if specified semantic relation type presents in the relations list.
     *
     * @param SemanticRelationType $relationType element whose presence in the list is to be tested
     * @return bool true if specified semantic relation type presents in the relations list
     */
    public function containsRelationType(SemanticRelationType $relationType): bool{
        foreach ($this->relations as $relation){
            if ($relation instanceof SematicRelation && $relation->getRelationType() == $relationType){
                return true;
            }
        }
        return false;
    }
  1. Function names should follow caml case.
    public function getRelation(int $index): Relation{
  1. Write getter and setter methods.
    public function getOrigin(): ?string
    public function setName(string $name): void
  1. Use standard javascript test style by extending the TestCase class. Use setup when necessary.
class WordNetTest extends TestCase
{
    private WordNet $turkish;

    protected function setUp(): void
    {
        ini_set('memory_limit', '450M');
        $this->turkish = new WordNet();
    }

    public function testSize()
    {
        $this->assertEquals(78327, $this->turkish->size());
    }
  1. Enumerated types should be declared with enum.
enum CategoryType
{
    case MATHEMATICS;
    case SPORT;
    case MUSIC;
    case SLANG;
    case BOTANIC;
  1. If there are multiple constructors for a class, define them as constructor1, constructor2, ..., then from the original constructor call these methods.
    public function constructor1(string $path, string $fileName): void
    public function constructor2(string $path, string $extension, int $index): void
    public function __construct(string $path, string $extension, ?int $index = null)
  1. Use __toString method if necessary to create strings from objects.
    public function __toString(): string
  1. Use xmlparser package for parsing xml files.
  $doc = new XmlDocument("../test.xml");
  $doc->parse();
  $root = $doc->getFirstChild();
  $firstChild = $root->getFirstChild();

About

Annotated Constituency Treebank Library

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages